kondounagi / ai-edge-contest-4th Goto Github PK

Shell 0.24% Python 11.00% C 0.01% C++ 14.30% Cuda 1.37% Makefile 0.25% Dockerfile 0.01% Jupyter Notebook 19.89% Verilog 44.55% CMake 0.40% HTML 0.02% CSS 0.03% MATLAB 0.10% Tcl 0.84% SystemVerilog 6.46% V 0.53%

ai-edge-contest-4th's People

Contributors

Watchers

ai-edge-contest-4th's Issues

モデルを量子化する

$ $ freeze_graph --help
usage: freeze_graph [-h] [--input_graph INPUT_GRAPH]
                    [--input_saver INPUT_SAVER]
                    [--input_checkpoint INPUT_CHECKPOINT]
                    [--checkpoint_version CHECKPOINT_VERSION]
                    [--output_graph OUTPUT_GRAPH]
                    [--input_binary [INPUT_BINARY]]
                    [--output_node_names OUTPUT_NODE_NAMES]
                    [--restore_op_name RESTORE_OP_NAME]
                    [--filename_tensor_name FILENAME_TENSOR_NAME]
                    [--clear_devices [CLEAR_DEVICES]]
                    [--initializer_nodes INITIALIZER_NODES]
                    [--variable_names_whitelist VARIABLE_NAMES_WHITELIST]
                    [--variable_names_blacklist VARIABLE_NAMES_BLACKLIST]
                    [--input_meta_graph INPUT_META_GRAPH]
                    [--input_saved_model_dir INPUT_SAVED_MODEL_DIR]
                    [--saved_model_tags SAVED_MODEL_TAGS]

optional arguments:
  -h, --help            show this help message and exit
  --input_graph INPUT_GRAPH
                        TensorFlow 'GraphDef' file to load.
  --input_saver INPUT_SAVER
                        TensorFlow saver file to load.
  --input_checkpoint INPUT_CHECKPOINT
                        TensorFlow variables file to load.
  --checkpoint_version CHECKPOINT_VERSION
                        Tensorflow variable file format
  --output_graph OUTPUT_GRAPH
                        Output 'GraphDef' file name.
  --input_binary [INPUT_BINARY]
                        Whether the input files are in binary format.
  --output_node_names OUTPUT_NODE_NAMES
                        The name of the output nodes, comma separated.
  --restore_op_name RESTORE_OP_NAME
                        The name of the master restore operator. Deprecated,
                        unused by updated loading code.
  --filename_tensor_name FILENAME_TENSOR_NAME
                        The name of the tensor holding the save path.
                        Deprecated, unused by updated loading code.
  --clear_devices [CLEAR_DEVICES]
                        Whether to remove device specifications.
  --initializer_nodes INITIALIZER_NODES
                        Comma separated list of initializer nodes to run
                        before freezing.
  --variable_names_whitelist VARIABLE_NAMES_WHITELIST
                        Comma separated list of variables to convert to
                        constants. If specified, only those variables will be
                        converted to constants.
  --variable_names_blacklist VARIABLE_NAMES_BLACKLIST
                        Comma separated list of variables to skip converting
                        to constants.
  --input_meta_graph INPUT_META_GRAPH
                        TensorFlow 'MetaGraphDef' file to load.
  --input_saved_model_dir INPUT_SAVED_MODEL_DIR
                        Path to the dir with TensorFlow 'SavedModel' file and
                        variables.
  --saved_model_tags SAVED_MODEL_TAGS
                        Group of tag(s) of the MetaGraphDef to load, in string
                        format, separated by ','. For tag-set contains
                        multiple tags, all tags must be passed in.

$ $ vai_q_tensorflow --help
usage:
    usage: vai_q_tensorflow command [Options]

    examples:
      show help       : vai_q_tensorflow --help
      quantize a model: vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb --input_nodes xxx --output_nodes yyy --input_shapes zzz --input_fn module.calib_input
      inspect a model : vai_q_tensorflow inspect --input_frozen_graph frozen_graph.pb
      dump quantized model : vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb --input_fn module.dump_input


Xilinx's Quantization Tools Vai_q_tensorflow v1.0.0 Build for Tensorflow
1.12.0

positional arguments:
  {quantize,inspect,dump}
                        Specify a command for vai_q_tensorflow

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --input_frozen_graph INPUT_FROZEN_GRAPH
                        The path to input frozen graph(.pb) (default: )
  --output_dir OUTPUT_DIR
                        The directory to save the quantization results
                        (default: ./quantize_results)
  --weight_bit WEIGHT_BIT
                        The target bit width for weights/biases (default: 8)
  --activation_bit ACTIVATION_BIT
                        The target bit width for activation (default: 8)
  --method {0,1}        The method for quantization, options are: 0: non-
                        overflow method, make sure no values are saturated
                        during quantization, may get bad results incase of
                        outliers. 1: min-diffs method, allow saturation for
                        large values during quantization to get smaller
                        quantization errors. This method is slower than method
                        0 but has higher endurance to outliers. (default: 1)
  --calib_iter CALIB_ITER
                        The iterations of calibration, total number of images
                        for calibration = calib_iter * batch_size (default:
                        100)
  --input_nodes INPUT_NODES
                        The name list of input nodes of the subgraph to be
                        quantized, comma separated. Used together with
                        output_nodes. When generating the model for deploy,
                        only the subgraph between input_nodes and output_nodes
                        will be included. Please set it to the begining of the
                        main body fo the model to quantize, such as the nodes
                        after data preprocessing and augmentation. (default: )
  --input_shapes INPUT_SHAPES
                        The shape list of input_nodes, The shape must be a
                        4-dimension shape for each node, comma separated, e.g.
                        1,224,224,3; Unknown size for batchsize is supported,
                        e.g. ?,224,224,3; In case of multiple input_nodes,
                        please assign the shape list of each node, separated
                        by `:`. e.g. ?,224,224,3:?,300,300,1 (default: )
  --output_nodes OUTPUT_NODES
                        The name list of output nodes of the subgraph to be
                        quantized, comma separated. Used together with
                        input_nodes. When generating the model for deploy,
                        only the subgraph between input_nodes and output_nodes
                        will be included. Please set it to the end of the main
                        body of the model to quantize, such as the nodes
                        before postprocessing. (default: )
  --ignore_nodes IGNORE_NODES
                        The name list of nodes to be ignored during
                        quantization, comma separated. The ignored nodes will
                        be left unquantized during quantization even if it is
                        quantizable. This argument has no effect for non-
                        quantizable nodes. (default: )
  --skip_check {0,1}    Set to 1 to skip the check for float model. (default:
                        0)
  --align_concat {0,1,2}
                        The strategy for alignment of the input quantize
                        positions for concat nodes. Set to 0 to align all
                        concat nodes, 1 to align the output concat nodes, 2 to
                        disable alignment (default: 0)
  --simulate_dpu {0,1}  Set to 1 to enable simulation of DPU. The behavior of
                        DPU for some operations are different from tensorflow.
                        For example, the dividing in LeakyRelu and AvgPooling
                        are replaced by bit-shifting, so there maybe slight
                        difference between DPU outputs and CPU/GPU outputs.
                        This quantizer will simulate the behavior for these
                        operations if this flag is set to 1 (default: 1)
  --input_fn INPUT_FN   The python importable function that provides the input
                        data. The format is `module_name.input_fn_name`, e.g.
                        'my_input_fn.input_fn'. The input_fn should take a
                        `int` object as input indicating the calibration step,
                        and should return a dict`(placeholder_node_name :
                        numpy.Array)` object for each call, which will be fed
                        into the model's placeholder nodes. (default: )
  --max_dump_batches MAX_DUMP_BATCHES
                        The maximum batches to be dumped (default: 1)
  --dump_float {0,1}    Set to 1 to dump the float weights/biases and
                        activation tensors together with the quantized
                        tensors. (default: 0)
  --gpu GPU             The gpu id used for quantization, comma separated.
                        (default: 0)
  --gpu_memory_fraction GPU_MEMORY_FRACTION
                        The gpu memory fraction used for quantization, between
                        0-1. (default: 0.5)

FastFCNを試す

Vitis-AI-LibraryのSegmentationモデルを試す

awsインスタンスへの必要ツールインストール

vitisのインストールにはここを参照した。
xserverはこのサイトを参照。

awsには既に入ってるので再度これやる必要はない。

xserver

sudo apt install xserver-xorg

/etc/ssh/ssh_configに以下を追記

ForwardX11 yes
ForwardX11Trusted yes

sshdリスタート

sudo service sshd restart

vitisのインストール

xilinxのダウンロードページ
からlinux統合インストーラ自己解答型をダウンロードし、scpかなにかで持ってくる。

そのままguiでインストールするとJAVAエラーが起こるのでcuiでやる。

mkdir vitis-install
./Xilinx_Unified_2020.1_0602_1208_Lin64.bin --noexec --target ~/vitis-install
cd vitis-install
sudo ./xsetup -b AuthTokenGen  #xilinxアカウント情報入力
./xsetup -b ConfigGen  #vitis(1番)を選択
sudo ./xsetup --agree XilinxEULA,3rdPartyEULA,WebTalkTerms --batch Install --config <設定ファイルのパス/ファイル名>

source /tools/Xilinx/Vivado/2020.1/settings64.sh
vivado

でxserver経由でvivadoが立ち上がったら成功

エラーでvivadoが起動しない

application-specific initialization failed: couldn't load file "librdi_commontasks.so": libtinfo.so.5: cannot open shared object file: No such file or directory

というエラーが出たら、ここを参考に

sudo apt update
sudo apt install libtinfo-dev
sudo ln -s /lib/x86_64-linux-gnu/libtinfo.so.6 /lib/x86_64-linux-gnu/libtinfo.so.5

したら治った

FasterSegを試す

signateのデータセットに対する, FasterSegの精度と速さを測定する.
https://github.com/VITA-Group/FasterSeg

AWSにVitis AIの環境構築をする

https://www.xilinx.com/support/documentation/sw_manuals/vitis_ai/1_2/ug1414-vitis-ai.pdf

onnx形式のモデルから, （tensorflowの）.pbファイルを経て, frozen graphを作る

まず, onnxから, tensorflowのGrafDef（.pbファイル）を作る.
onnx-tensorflowはブランチ（tf-1.x）のものを取ってくる（https://github.com/onnx/onnx-tensorflow/tree/tf-1.x）.
onnx-tensorflowのgithub内の, https://github.com/onnx/onnx-tensorflow/blob/tf-1.x/example/onnx_to_tf.py に沿って,

import onnx

from onnx_tf.backend import prepare

onnx_model = onnx.load("path/to/model.onnx")  # load onnx model
tf_rep = prepare(onnx_model)  # prepare tf representation
tf_rep.export_graph("path/to/output_model.pb")  # export the model

を実行すると, .pbファイルが出来る.
ここで, このコードを実行する環境のtensorflowのバージョンに注意する.
この.pbファイルを作った環境のtensorflowのバージョンと, この.pbファイルをロードする環境のtensorflowのバージョンが異なっていると, 例えば, tensorflow/tensorflow#22994 や,

Traceback (most recent call last):
  File "view_graph.py", line 44, in <module>
    graph = load_model()
  File "view_graph.py", line 39, in load_model
    tf.import_graph_def(graph_def, name="")
  File "/home/suzuki/.pyenv/versions/miniconda3-latest/envs/oxnn/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/suzuki/.pyenv/versions/miniconda3-latest/envs/oxnn/lib/python3.7/site-packages/tensorflow/python/framework/importer.py", line 430, in import_graph_def
    raise ValueError(str(e))
ValueError: NodeDef mentions attr 'incompatible_shape_error' not in Op<name=Equal; signature=x:T, y:T -> z:bool; attr=T:type,allowed=[DT_BFLOAT16, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_UINT8, ..., DT_QINT8, DT_QINT32, DT_STRING, DT_BOOL, DT_COMPLEX128]; is_commutative=true>; NodeDef: {{node assert_equal_1/Equal}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

といったエラーがでる.

python jit_test.py Illegal instruction (core dumped)

# packages in environment at /opt/vitis_ai/conda/envs/vitis-ai-pytorch:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
blas                      1.0                         mkl
bzip2                     1.0.8                h7b6447c_0
ca-certificates           2020.10.14                    0
cairo                     1.14.12              h8948797_3
certifi                   2020.6.20          pyhd3eb1b0_3
cffi                      1.14.0           py36h2e261b9_0
cudatoolkit               10.0.130                      0
cudnn                     7.6.5                cuda10.0_0
expat                     2.2.10               he6710b0_2
ffmpeg                    4.0                  hcdf2ecd_0
fontconfig                2.13.1            h2176d3f_1000    conda-forge/label/gcc7
freeglut                  3.0.0                hf484d3e_5
freetype                  2.10.4               h5ab3b9f_0
gettext                   0.19.8.1             hd7bead4_3
gflags                    2.2.2                he6710b0_0
glib                      2.56.2               hd408876_0
glog                      0.4.0                he6710b0_0
graphite2                 1.3.14               h23475e2_0
graphviz                  2.38.0            hcf1ce16_1009    conda-forge/label/gcc7
harfbuzz                  1.9.0             he243708_1001    conda-forge/label/gcc7
hdf5                      1.10.2               hba1933b_1
icu                       58.2                 he6710b0_3
intel-openmp              2020.2                      254
jasper                    2.0.14               h07fcdf6_1
jpeg                      9c                h14c3975_1001    conda-forge/label/gcc7
json-c                    0.13.1               h1bed415_0
lcms2                     2.11                 h396b838_0
ld_impl_linux-64          2.33.1               h53a641e_7
libedit                   3.1.20191231         h14c3975_1
libffi                    3.2.1             hf484d3e_1007
libgcc-ng                 9.1.0                hdf63c60_0
libgfortran-ng            7.3.0                hdf63c60_0
libglu                    9.0.0                hf484d3e_1
libopencv                 3.4.2                hb342d67_1
libopus                   1.3.1                h7b6447c_0
libpng                    1.6.37               hbc83047_0
libprotobuf               3.11.4               hd408876_0
libstdcxx-ng              9.1.0                hdf63c60_0
libtiff                   4.1.0                h2733197_1
libtool                   2.4.6             h7b6447c_1005
libuuid                   2.32.1            h14c3975_1000    conda-forge/label/gcc7
libvpx                    1.7.0                h439df22_0
libxcb                    1.14                 h7b6447c_0
libxml2                   2.9.10               hb55368b_3
lz4-c                     1.9.2                heb0550a_3
mkl                       2020.2                      256
mkl-service               2.3.0            py36he904b0f_0
mkl_fft                   1.2.0            py36h23d657b_0
mkl_random                1.1.1            py36h0573a6f_0
ncurses                   6.2                  he6710b0_1
ninja                     1.10.1           py36hfd86e86_0
numpy                     1.17.2           py36haad9e8e_0
numpy-base                1.17.2           py36hde5b4d6_0
olefile                   0.46                     py36_0
opencv                    3.4.2            py36h6fd60c2_1
openssl                   1.1.1h               h7b6447c_0
pango                     1.40.14           hf0c64fd_1003    conda-forge/label/gcc7
pcre                      8.44                 he6710b0_0
pillow                    8.0.1            py36he98fc37_0
pip                       20.2.4           py36h06a4308_0
pixman                    0.40.0               h7b6447c_0
protobuf                  3.11.4           py36he6710b0_0
py-opencv                 3.4.2            py36hb342d67_1
pybind11                  2.5.0            py36hfd86e86_0
pycparser                 2.20                       py_2
python                    3.6.10               hcf32534_1
pytorch                   1.1.0           cuda100py36he554f03_0
pytorch_nndct             1.2.0                4_a5f1f456    file:///scratch/conda-channel
readline                  8.0                  h7b6447c_0
scipy                     1.3.1            py36h7c811a0_0
setuptools                50.3.0           py36h06a4308_1
six                       1.15.0                     py_0
sqlite                    3.33.0               h62c20be_0
target_factory            1.2.0               10_65a73cb6    file:///scratch/conda-channel
tk                        8.6.10               hbc83047_0
torchvision               0.3.0           cuda100py36h72fc40a_0
tqdm                      4.50.2                     py_0
unilog                    1.2.0               10_4f1575a6    file:///scratch/conda-channel
vart                      1.2.0               16_a7d6128b    file:///scratch/conda-channel
wheel                     0.35.1                     py_0
xir                       1.2.0               12_69d7e69c    file:///scratch/conda-channel
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge/label/gcc7
xorg-libice               1.0.9             h14c3975_1004    conda-forge/label/gcc7
xorg-libsm                1.2.3             h4937e3b_1000    conda-forge/label/gcc7
xorg-libx11               1.6.6             h14c3975_1000    conda-forge/label/gcc7
xorg-libxext              1.3.3             h14c3975_1004    conda-forge/label/gcc7
xorg-libxpm               3.5.12            h14c3975_1002    conda-forge/label/gcc7
xorg-libxrender           0.9.10            h14c3975_1002    conda-forge/label/gcc7
xorg-libxt                1.1.5             h14c3975_1002    conda-forge/label/gcc7
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge/label/gcc7
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge/label/gcc7
xorg-xproto               7.0.31            h14c3975_1007    conda-forge/label/gcc7
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3
zstd                      1.4.5                h9ceee32_0

bisenetv2_quant.pyのdebug

人生楽しすぎるンゴ

pytorchで学習したモデルをtensorflowの形式に変換する

参考記事
https://qiita.com/hirune924/items/06520f44927b0844a86c

"Vitis AI (on Ultra96V2) Custom Platform Tutorial" を試す

https://github.com/Xilinx/Vitis-AI-Tutorials/tree/Vitis-AI-Custom-Platform

tensorflowでBiSenetを動かす

Vitis-aiのgithub（https://github.com/Xilinx/Vitis-AI/tree/v1.2.1/Vitis-AI-Quantizer/vai_q_tensorflow）によると,
Vitis-ai1.2.1はtensorflow1.15を使っているようなので, tensorflow 1.xで動くモデルを用意する.

User Guide(https://www.xilinx.com/support/documentation/sw_manuals/vitis_ai/1_2/ug1414-vitis-ai.pdf)より

p.51: The vai_q_tensorflow quantizer is based on Tensorflow 1.15. The vai_q_pytorch quantizer supports Pytorch from 1.1-1.4.

BiSeNet を試す

https://github.com/CoinCheung/BiSeNet
をFPGA上のCPUで推論させる．
FPGA上にcloneし，pretrainモデルを用いる．

FPGA上での依存ライブラリの一部がコピーされていない

多分シンボリックリンクのままやらなかったせい．
ライブラリはAYのマシンでそのままSDカードにコピーする必要があるみたいなので，SDカードリーダーが必要．

semantic segmentationでNVIDIA DALIの使い方が分からない

公式ドキュメント
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

まずは, 以下のコマンドでインストール
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100

精度、trainの仕方などのアイディア

trainについて
画像サイズがクソデカイので、そのままのロードだとtrainに大変時間がかかる。width=2048で、train時間の85%は画像のロードである。したがって、データセットを事前にダウンサイズすると良いと思われる。例えば、width=1024のを事前に準備しておけば、trainの時間は1/4になると思われる。

app部分で第二回の殿堂入りから持ってこれないか検討する

モデルを量子化する

$ $ freeze_graph --help
usage: freeze_graph [-h] [--input_graph INPUT_GRAPH]
                    [--input_saver INPUT_SAVER]
                    [--input_checkpoint INPUT_CHECKPOINT]
                    [--checkpoint_version CHECKPOINT_VERSION]
                    [--output_graph OUTPUT_GRAPH]
                    [--input_binary [INPUT_BINARY]]
                    [--output_node_names OUTPUT_NODE_NAMES]
                    [--restore_op_name RESTORE_OP_NAME]
                    [--filename_tensor_name FILENAME_TENSOR_NAME]
                    [--clear_devices [CLEAR_DEVICES]]
                    [--initializer_nodes INITIALIZER_NODES]
                    [--variable_names_whitelist VARIABLE_NAMES_WHITELIST]
                    [--variable_names_blacklist VARIABLE_NAMES_BLACKLIST]
                    [--input_meta_graph INPUT_META_GRAPH]
                    [--input_saved_model_dir INPUT_SAVED_MODEL_DIR]
                    [--saved_model_tags SAVED_MODEL_TAGS]

optional arguments:
  -h, --help            show this help message and exit
  --input_graph INPUT_GRAPH
                        TensorFlow 'GraphDef' file to load.
  --input_saver INPUT_SAVER
                        TensorFlow saver file to load.
  --input_checkpoint INPUT_CHECKPOINT
                        TensorFlow variables file to load.
  --checkpoint_version CHECKPOINT_VERSION
                        Tensorflow variable file format
  --output_graph OUTPUT_GRAPH
                        Output 'GraphDef' file name.
  --input_binary [INPUT_BINARY]
                        Whether the input files are in binary format.
  --output_node_names OUTPUT_NODE_NAMES
                        The name of the output nodes, comma separated.
  --restore_op_name RESTORE_OP_NAME
                        The name of the master restore operator. Deprecated,
                        unused by updated loading code.
  --filename_tensor_name FILENAME_TENSOR_NAME
                        The name of the tensor holding the save path.
                        Deprecated, unused by updated loading code.
  --clear_devices [CLEAR_DEVICES]
                        Whether to remove device specifications.
  --initializer_nodes INITIALIZER_NODES
                        Comma separated list of initializer nodes to run
                        before freezing.
  --variable_names_whitelist VARIABLE_NAMES_WHITELIST
                        Comma separated list of variables to convert to
                        constants. If specified, only those variables will be
                        converted to constants.
  --variable_names_blacklist VARIABLE_NAMES_BLACKLIST
                        Comma separated list of variables to skip converting
                        to constants.
  --input_meta_graph INPUT_META_GRAPH
                        TensorFlow 'MetaGraphDef' file to load.
  --input_saved_model_dir INPUT_SAVED_MODEL_DIR
                        Path to the dir with TensorFlow 'SavedModel' file and
                        variables.
  --saved_model_tags SAVED_MODEL_TAGS
                        Group of tag(s) of the MetaGraphDef to load, in string
                        format, separated by ','. For tag-set contains
                        multiple tags, all tags must be passed in.

$ $ vai_q_tensorflow --help
usage:
    usage: vai_q_tensorflow command [Options]

    examples:
      show help       : vai_q_tensorflow --help
      quantize a model: vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb --input_nodes xxx --output_nodes yyy --input_shapes zzz --input_fn module.calib_input
      inspect a model : vai_q_tensorflow inspect --input_frozen_graph frozen_graph.pb
      dump quantized model : vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb --input_fn module.dump_input


Xilinx's Quantization Tools Vai_q_tensorflow v1.0.0 Build for Tensorflow
1.12.0

positional arguments:
  {quantize,inspect,dump}
                        Specify a command for vai_q_tensorflow

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --input_frozen_graph INPUT_FROZEN_GRAPH
                        The path to input frozen graph(.pb) (default: )
  --output_dir OUTPUT_DIR
                        The directory to save the quantization results
                        (default: ./quantize_results)
  --weight_bit WEIGHT_BIT
                        The target bit width for weights/biases (default: 8)
  --activation_bit ACTIVATION_BIT
                        The target bit width for activation (default: 8)
  --method {0,1}        The method for quantization, options are: 0: non-
                        overflow method, make sure no values are saturated
                        during quantization, may get bad results incase of
                        outliers. 1: min-diffs method, allow saturation for
                        large values during quantization to get smaller
                        quantization errors. This method is slower than method
                        0 but has higher endurance to outliers. (default: 1)
  --calib_iter CALIB_ITER
                        The iterations of calibration, total number of images
                        for calibration = calib_iter * batch_size (default:
                        100)
  --input_nodes INPUT_NODES
                        The name list of input nodes of the subgraph to be
                        quantized, comma separated. Used together with
                        output_nodes. When generating the model for deploy,
                        only the subgraph between input_nodes and output_nodes
                        will be included. Please set it to the begining of the
                        main body fo the model to quantize, such as the nodes
                        after data preprocessing and augmentation. (default: )
  --input_shapes INPUT_SHAPES
                        The shape list of input_nodes, The shape must be a
                        4-dimension shape for each node, comma separated, e.g.
                        1,224,224,3; Unknown size for batchsize is supported,
                        e.g. ?,224,224,3; In case of multiple input_nodes,
                        please assign the shape list of each node, separated
                        by `:`. e.g. ?,224,224,3:?,300,300,1 (default: )
  --output_nodes OUTPUT_NODES
                        The name list of output nodes of the subgraph to be
                        quantized, comma separated. Used together with
                        input_nodes. When generating the model for deploy,
                        only the subgraph between input_nodes and output_nodes
                        will be included. Please set it to the end of the main
                        body of the model to quantize, such as the nodes
                        before postprocessing. (default: )
  --ignore_nodes IGNORE_NODES
                        The name list of nodes to be ignored during
                        quantization, comma separated. The ignored nodes will
                        be left unquantized during quantization even if it is
                        quantizable. This argument has no effect for non-
                        quantizable nodes. (default: )
  --skip_check {0,1}    Set to 1 to skip the check for float model. (default:
                        0)
  --align_concat {0,1,2}
                        The strategy for alignment of the input quantize
                        positions for concat nodes. Set to 0 to align all
                        concat nodes, 1 to align the output concat nodes, 2 to
                        disable alignment (default: 0)
  --simulate_dpu {0,1}  Set to 1 to enable simulation of DPU. The behavior of
                        DPU for some operations are different from tensorflow.
                        For example, the dividing in LeakyRelu and AvgPooling
                        are replaced by bit-shifting, so there maybe slight
                        difference between DPU outputs and CPU/GPU outputs.
                        This quantizer will simulate the behavior for these
                        operations if this flag is set to 1 (default: 1)
  --input_fn INPUT_FN   The python importable function that provides the input
                        data. The format is `module_name.input_fn_name`, e.g.
                        'my_input_fn.input_fn'. The input_fn should take a
                        `int` object as input indicating the calibration step,
                        and should return a dict`(placeholder_node_name :
                        numpy.Array)` object for each call, which will be fed
                        into the model's placeholder nodes. (default: )
  --max_dump_batches MAX_DUMP_BATCHES
                        The maximum batches to be dumped (default: 1)
  --dump_float {0,1}    Set to 1 to dump the float weights/biases and
                        activation tensors together with the quantized
                        tensors. (default: 0)
  --gpu GPU             The gpu id used for quantization, comma separated.
                        (default: 0)
  --gpu_memory_fraction GPU_MEMORY_FRACTION
                        The gpu memory fraction used for quantization, between
                        0-1. (default: 0.5)

Vitis-AI 1.2 環境を共有する

Host kernel
     HostName 18.218.168.50
     User ubuntu
     Port 2211

Load image : ILSVRC2012_val_00000001.JPEG

Run DPU Task for ResNet50 ...
[DPU mode]
normal

[DPU timeout limitation (in seconds)]
20

[DPU Debug Info]
Core 0 schedule : 3
Core 0 interrupt: 3

[DPU Resource]
DPU Core  	: 0
State     	: Idle
PID       	: 2861
TaskID    	: 2861
Start     	: 548890522784
End       	: 548155570688

[DPU Core 0 Register]
CTL       : 0x00000001
GIE       : 0x00000001
IRQ       : 0x00000000
HP        : 0x07070f0f
CODE      : 0x0000000000060100
BASE0     : 0x0000000060300000
BASE1     : 0x0000000061c00000
BASE2     : 0x0000000000000000
BASE3     : 0x0000000000000000
BASE4     : 0x0000000000000000
BASE5     : 0x0000000000000000
BASE6     : 0x0000000000000000
BASE7     : 0x0000000000000000
CYCLE_H   : 0x00000000
CYCLE_L   : 0x00000000
REGVER    : 0x31016b02
TIMESTAMP : 0x13b12123
GITID     : 0x06aec4c8
GITTIME   : 0x71ea5221
VERSION   : 0x00000140
TIMER     : 0x00000000
ARCH      : 0x31240c0c
RAM       : 0x00001333
LOAD      : 0x00000102
CONV      : 0x00000111
SAVE      : 0x00000002
POOL      : 0x00000001
ELEW      : 0x00000001
DWCV      : 0x00000012
MISC      : 0x00000001
DPU STATUS: 0x00000000
AXI STATUS: 0x00455656
LOAD START: 1412
LOAD END  : 1410
SAVE START: 174
SAVE END  : 174
CONV START: 1217
CONV END  : 1217
MISC START: 89
MISC END  : 89


[DNNDK] DPU timeout while execute DPU Task:resnet50-0

vai_q_tensorflowを実行する際の, input_nodesとoutput_nodesを探す

まず, pytorchのモデルをonnx形式に変換する際に,

torch.onnx.export

を用いたが, この際, input_names="image_array", output_names="category"とした.

vai_q_tensorflowのinput fileの内, input_fn.pyを作る

Vitis AI User Guide
p46: For quantize calibration, calibration data without label is enough.
p47: Quantize, using a subset (200 images) of validation data for calibration. Because we are in quantize calibration process, the displayed loss and accuracy are meaningless.
p52: In the quantize calibration process, only a small set of unlabeled images are required to analyze the distribution of activations.

などとあるので, Quantizationは学習を目的としているわけではなさそう（…？）よって, validationデータを用意して, 評価時用の前処理を行えば良い（…？）