Git Product home page Git Product logo

nn_dataflow's Introduction

https://travis-ci.org/stanford-mast/nn_dataflow.svg?branch=master https://coveralls.io/repos/github/stanford-mast/nn_dataflow/badge.svg?branch=master

Neural Network Dataflow Scheduling

This Python tool allows you to explore the energy-efficient dataflow scheduling for neural networks (NNs), including array mapping, loop blocking and reordering, and (coarse-grained) parallel processing within and across layers.

For hardware, we assume an Eyeriss-style NN accelerator [Chen16], i.e., a 2D array of processing elements (PEs) with a local register file in each PE, and a global SRAM buffer shared by all PEs. We further support a tiled architecture with multiple nodes that can partition and process the NN computations in parallel. Each node is an Eyeriss-style engine as above.

In software, we decouple the dataflow scheduling into three subproblems:

  • Array mapping, which deals with mapping one 2D convolution computation (one 2D ifmap convolves with one 2D filter to get one 2D ofmap) onto the hardware PE array. We support row stationary mapping [Chen16].
  • Loop blocking and reordering, which decides the order between all 2D convolutions by blocking and reordering the nested loops. We support exhaustive search over all blocking and reordering schemes [Yang16], and analytical bypass solvers [Gao17].
  • Parallel processing, which partitions the NN computations across the multiple tiled engines. We support both intra-layer and inter-layer parallelism. For intra-layer, we support batch partitioning, fmap partitioning, output partitioning, input partitioning, and the combination between them (hybrid) [Gao17]. We also explore various dataflow optimizations including access forwarding and buffer sharing [Gao19]. We use exhaustive search within each layer. For inter-layer, we support spatial pipelining (inter-layer pipelining) and temporal pipelining (time multiplexing without writing back intermediate data) as well as their optimized scheduling [Gao19]. We use layer-wise greedy beam search across layers.

See the details in our ASPLOS'17 [Gao17] and ASPLOS'19 [Gao19] papers.

If you use this tool in your work, we kindly request that you reference our paper(s) below, and send us a citation of your work.

  • Gao et al., "TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory", in ASPLOS, April 2017.
  • Gao et al., "TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators", in ASPLOS. April 2019.

Install

nn_dataflow supports Python 3.6 and above.

nn_dataflow can be directly used without installation if you have first defined the environment variable PYTHONPATH to include the top directory path. See the Usage section below for details.

nn_dataflow has been registered on PyPI, so it can be installed through pip as:

> pip install nn-dataflow

And pip will take care of all dependencies.

To only install nn_dataflow in local user install directory (without sudo), and/or to install in editable mode, at the top directory do:

> pip install --user -e .

Usage

First, define the NN structure in nn_dataflow/nns. We already defined several popular NNs for you, including AlexNet, VGG-16, GoogLeNet, ResNet-152, etc.

Then, use nn_dataflow/tools/nn_dataflow_search.py to search for the optimal dataflow for the NN. For detailed options, type:

> python ./nn_dataflow/tools/nn_dataflow_search.py -h

You can specify NN batch size and word size, PE array dimensions, number of tile nodes, register file and global buffer capacity, and the energy cost of all components. Note that, the energy cost of array bus should be the average energy of transferring the data from the buffer to one PE, not local neighbor transfer; the unit static energy cost should be the static energy of all nodes in one clock cycle.

Other options include:

  • -g, --goal: E, D, or ED. the optimization goal, e(nergy), d(elay), or ED product.
  • --mem-type: 2D or 3D. With 2D memory, memory channels are only on the four corners of the chip; with 3D memory, memory channels are on the top of all tile nodes (one per each).
  • --bus-width: the multicast bus bit width in the PE array for one data type. Set to 0 to ignore multicast overheads.
  • --dram-bw: float or inf. Total DRAM bandwidth for all tile nodes, in bytes per cycle.
  • --disable-bypass: a combination of i, o, f, whether to disallow global buffer bypass for ifmaps, ofmaps, and weights.
  • --solve-loopblocking: whether to use analytical bypass solvers for loop blocking and reordering. See [Gao17].
  • --hybrid-partitioning: whether to use hybrid partitioning in [Gao17]. If not enabled, use naive partitioning, i.e., fmap partitioning for CONV layers, and output partitioning for FC layers.
  • --batch-partitioning and --ifmap-partitioning: whether the hybrid partitioning also explores batch and input partitioning.
  • --enable-access-forwarding: access forwarding, where the nodes fetch disjoint subsets of data and forward them to other nodes. See [Gao19].
  • --enable-gbuf-sharing: buffer sharing, where the global buffer capacity is shared across nodes through NoC. See [Gao19].
  • --enable-save-writeback: allow to elide the intermediate data writeback to memory when switching between layers if it is possible to store the entire data set in on-chip buffers.
  • --interlayer-partition: whether to use inter-layer pipelining to partition resources across multiple layers and process them simultaneously.
  • --layer-pipeline-time-overhead, --layer-pipeline-max-degree: constrain the configuration space of inter-layer pipelining, by specifying the maximum execution time overhead, or the maximum pipelining degree.
  • --disable-interlayer-opt: disable optimizations and only allow basic inter-layer pipelining.

Code Structure

  • nn_dataflow
    • core
      • Top-level dataflow exploration: nn_dataflow, nn_dataflow_scheme.
      • Layer scheduling: scheduling.
      • Array mapping: map_strategy.
      • Loop blocking and reordering: loop_blocking, loop_blocking_scheme, loop_blocking_solver.
      • Intra-layer partitioning: partition, partition_scheme, buf_shr_scheme.
      • Inter-layer pipelining: inter_layer_pipeline, pipeline_segment.
      • Network and layer: network, layer.
    • nns: example NN definitions.
    • tests: unit tests.
    • tools: executables.

Verification and Testing

To verify the tool against the Eyeriss result [Chen16], see nn_dataflow/tests/dataflow_test/test_nn_dataflow.py.

To run (unit) tests, do one of the following:

> python -m unittest discover

> python -m pytest

> pytest

To check code coverage with pytest-cov plug-in:

> pytest --cov=nn_dataflow

Copyright & License

nn_dataflow is free software; you can redistribute it and/or modify it under the terms of the BSD License as published by the Open Source Initiative, revised version.

nn_dataflow was originally written by Mingyu Gao at Stanford University, and per Stanford University policy, the copyright of this original code remains with the Board of Trustees of Leland Stanford Junior University.

References

[Gao19](1, 2, 3, 4, 5) Gao, Yang, Pu, Horowitz, and Kozyrakis, TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators, in ASPLOS. April, 2019.
[Gao17](1, 2, 3, 4, 5) Gao, Pu, Yang, Horowitz, and Kozyrakis, TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory, in ASPLOS. April, 2017.
[Chen16](1, 2, 3) Chen, Emer, and Sze, Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks, in ISCA. June, 2016.
[Yang16]Yang, Pu, Rister, Bhagdikar, Richardson, Kvatinsky, Ragan-Kelley, Pedram, and Horowitz, A Systematic Approach to Blocking Convolutional Neural Networks, arXiv preprint, 2016.

nn_dataflow's People

Contributors

cnjsdfcy avatar derange-alembic avatar gaomy3832 avatar jingpu avatar msharmavikram avatar xuanyoya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nn_dataflow's Issues

2D memory

What does it mean to have memory nodes are on two sides? (nn_dataflow_search.py)
When the node has the dimension of (1x1) then basically you are adding two node regions for DATA with the same configurations. What does it exactly represent?

Thanks

Test failures after pip installation

Hi Professor Gao,

I'm just getting started with nn_dataflow and I'm failing a couple tests after installing. I've attached my terminal contents. Do you have any suggestions? I'm using Ubuntu 18.04, if it matters.

Thank you.

deppe@bardiche:~$ git clone https://github.com/stanford-mast/nn_dataflow.git
Cloning into 'nn_dataflow'...
remote: Enumerating objects: 300, done.
remote: Counting objects: 100% (300/300), done.
remote: Compressing objects: 100% (212/212), done.
remote: Total 3885 (delta 171), reused 160 (delta 88), pack-reused 3585
Receiving objects: 100% (3885/3885), 1.05 MiB | 3.14 MiB/s, done.
Resolving deltas: 100% (2994/2994), done.
deppe@bardiche:~$ cd nn_dataflow
deppe@bardiche:~/nn_dataflow$ pip3 install --user -e .
Obtaining file:///home/deppe/nn_dataflow
Collecting argparse (from nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py3-none-any.whl
Collecting coverage>=4 (from nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/2a/3e/fc18ecef69f174c13493576f46966053c1da07fd8721962530dc1a10b1ca/coverage-5.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting fastcache>=1 (from nn-dataflow==2.1)
Collecting pytest-cov>=2 (from nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/b9/54/3673ee8be482f81527678ac894276223b9814bb7262e4f730469bb7bf70e/pytest_cov-2.8.1-py2.py3-none-any.whl
Collecting pytest-xdist>=1 (from nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/7c/8c/7f93c1d82f25a69a1c6e68189b9cf5ddce08dcaefdbd913d328b0234e13b/pytest_xdist-1.31.0-py2.py3-none-any.whl
Collecting pytest>=3 (from nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/c7/e2/c19c667f42f72716a7d03e8dd4d6f63f47d39feadd44cc1ee7ca3089862c/pytest-5.4.1-py3-none-any.whl
Collecting sympy>=1 (from nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/ce/5b/acc12e3c0d0be685601fc2b2d20ed18dc0bf461380e763afc9d0a548deb0/sympy-1.5.1-py2.py3-none-any.whl
Collecting pytest-forked (from pytest-xdist>=1->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/03/1e/81235e1fcfed57a4e679d34794d60c01a1e9a29ef5b9844d797716111d80/pytest_forked-1.1.3-py2.py3-none-any.whl
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from pytest-xdist>=1->nn-dataflow==2.1) (1.11.0)
Collecting execnet>=1.1 (from pytest-xdist>=1->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/d3/2e/c63af07fa471e0a02d05793c7a56a9f7d274a8489442a5dc4fb3b2b3c705/execnet-1.7.1-py2.py3-none-any.whl
Collecting packaging (from pytest>=3->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/62/0a/34641d2bf5c917c96db0ded85ae4da25b6cd922d6b794648d4e7e07c88e5/packaging-20.3-py2.py3-none-any.whl
Collecting importlib-metadata>=0.12; python_version < "3.8" (from pytest>=3->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/ad/e4/891bfcaf868ccabc619942f27940c77a8a4b45fd8367098955bb7e152fb1/importlib_metadata-1.6.0-py2.py3-none-any.whl
Collecting more-itertools>=4.0.0 (from pytest>=3->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/72/96/4297306cc270eef1e3461da034a3bebe7c84eff052326b130824e98fc3fb/more_itertools-8.2.0-py3-none-any.whl
Collecting py>=1.5.0 (from pytest>=3->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/99/8d/21e1767c009211a62a8e3067280bfce76e89c9f876180308515942304d2d/py-1.8.1-py2.py3-none-any.whl
Requirement already satisfied: wcwidth in /usr/lib/python3/dist-packages (from pytest>=3->nn-dataflow==2.1) (0.1.7)
Collecting attrs>=17.4.0 (from pytest>=3->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/a2/db/4313ab3be961f7a763066401fb77f7748373b6094076ae2bda2806988af6/attrs-19.3.0-py2.py3-none-any.whl
Collecting pluggy<1.0,>=0.12 (from pytest>=3->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/a0/28/85c7aa31b80d150b772fbe4a229487bc6644da9ccb7e427dd8cc60cb8a62/pluggy-0.13.1-py2.py3-none-any.whl
Collecting mpmath>=0.19 (from sympy>=1->nn-dataflow==2.1)
Collecting apipkg>=1.4 (from execnet>=1.1->pytest-xdist>=1->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/67/08/4815a09603fc800209431bec5b8bd2acf2f95abdfb558a44a42507fb94da/apipkg-1.5-py2.py3-none-any.whl
Requirement already satisfied: pyparsing>=2.0.2 in /usr/lib/python3/dist-packages (from packaging->pytest>=3->nn-dataflow==2.1) (2.2.0)
Collecting zipp>=0.5 (from importlib-metadata>=0.12; python_version < "3.8"->pytest>=3->nn-dataflow==2.1)
  Using cached https://files.pythonhosted.org/packages/b2/34/bfcb43cc0ba81f527bc4f40ef41ba2ff4080e047acb0586b56b3d017ace4/zipp-3.1.0-py3-none-any.whl
Installing collected packages: argparse, coverage, fastcache, packaging, zipp, importlib-metadata, more-itertools, py, attrs, pluggy, pytest, pytest-cov, pytest-forked, apipkg, execnet, pytest-xdist, mpmath, sympy, nn-dataflow
  Running setup.py develop for nn-dataflow
Successfully installed apipkg-1.5 argparse-1.4.0 attrs-19.3.0 coverage-5.1 execnet-1.7.1 fastcache-1.1.0 importlib-metadata-1.6.0 more-itertools-8.2.0 mpmath-1.1.0 nn-dataflow packaging-20.3 pluggy-0.13.1 py-1.8.1 pytest-5.4.1 pytest-cov-2.8.1 pytest-forked-1.1.3 pytest-xdist-1.31.0 sympy-1.5.1 zipp-3.1.0
WARNING: You are using pip version 19.1.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
deppe@bardiche:~/nn_dataflow$ python3 -m unittest discover
.............No valid schedule found for AlexNet.
No valid schedule found for AlexNet.
............................................................................SSM1
SSM2
SSM3
..................................................................................FFFF.......................................................................................................................................................................................................................................................................................................................................................................
======================================================================
FAIL: test_3d_mem (nn_dataflow.tests.tool_test.test_nn_dataflow_search.TestNNDataflowSearch)
With 3D memory.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/deppe/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 45, in test_3d_mem
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

======================================================================
FAIL: test_default_invoke (nn_dataflow.tests.tool_test.test_nn_dataflow_search.TestNNDataflowSearch)
Default invoke.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/deppe/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 40, in test_default_invoke
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

======================================================================
FAIL: test_no_dataflow (nn_dataflow.tests.tool_test.test_nn_dataflow_search.TestNNDataflowSearch)
No dataflow scheme found.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/deppe/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 53, in test_no_dataflow
    self.assertEqual(ret, 2)
AssertionError: 1 != 2

======================================================================
FAIL: test_default_invoke (nn_dataflow.tests.tool_test.test_nn_layer_stats.TestNNLayerStats)
Default invoke.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/deppe/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_layer_stats.py", line 38, in test_default_invoke
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

----------------------------------------------------------------------
Ran 534 tests in 842.881s

FAILED (failures=4)

A question of the data size of register file

In init(...) of class LoopBlockingScheme:
self.data_size(BL.REGF) > resource.size_regf is checking the data size in the register file.
In my understanding, it means that the times of using the same data in the register file(tx2,ty2,tz2) × unit_size[REGF],and procpass is the unit for tx2,ty2,tz2.
And self.unit_size[BL.REGF] = self.nld.usize_regf,
However,in class MapStrategyEyeriss, we use the return of _calc_unitpass() to build nld(NestedLoopDes) :
using CNN layer for example , usize_regf[de.FIL]=sz_regf[de.FIL] = buflayer.wfil & usize_regf[de.IFM]=sz_regf[de.IFM] = buflayer.wfil & usize_regf[de.OFM] sz_regf[de.OFM] = 1,to represent 1D CONV in a PE.
So,in my opinion,when caculating the data size of regf,why not use fold.h × usize_regf to present the size of regf in running a procpass,and then × the run times of procpass in a not change regf (tx2,ty2,tz2).

Data movement cost of MACs (in conv layer) with low data-reuse

Hi Gao,

Thanks for the nice tool. It helps us a lot in prototyping our ideas.
One thing that I'm not sure about is: whether this tool appropriately models the data movement cost of MACs with low data-reuse? For example, MACs in depth-wise convolution have very low data reuse because of the fragmented memory-access. In this tool, cost of any convolution is decided based on the dimensions of convolution layer. Although the dimesnions in depth-wise convolution will be changed (because only one filter channel in filter convolve with a single input feature map), changing dimesnion would be enough to model the data movement cost or do we need to model some extra relative cost?

Are SPADs in the PE also double-buffered?

Hi,

In the simulation, are SPADs in the PE also double-buffered?
I just started looking at the codes for Eyeriss, it seems like you set the regf as ' size_regf=261, # (225 + 12 + 24)'.
In the paper, it's 1216b+22416b+24*16b which is two times larger.
Also, it seems like the total of ifmap, filter, psum is 224+12+24 =260.
Could you help to clarify a little bit?
Thank you.

Other DNN accelerator architecuture support?

Dear Professor Gao,

Thanks for creating and sharing this tool.

Is nn_dataflow easy to support dataflow mapping search for other accelerator arch? e.g. NVDLA.
Since I haven't dive into this tool's code yet, I am wondering is it easy to modify the arch definitions or not?

B&R.

Cannot reporduce results for vgg-net using eyeriss_isscc16 model

I am not able to generate latency numbers reported in Table 6, Eyeriss paper in isscc16 journal. I have used the test setup in test_nn_dataflow.py. The numbers are perfect for AlexNet but vgg-16 seems to produce latency proportional to the number of ops which is not the case.
The authors in Eyeriss paper state that "CONV1-2 and CONV4-2" have same amount of MAC operations former takes nearly four times to process than later. Using the defined dataflow and cost, the latency numbers are similar for both the layers

Temporal schedule in inter-layer pipeline[Gao19]

Hi, I am trying to understand the inter-layer pipeline schedule proposed by [Gao19], to my understanding:

  1. Before beam search, each layer will be grouped to several segments( use itself as
    the ending layer), and the segment will be checked its validation on data dependency, resource allocation and symbolic scheduling constraints roughly, right? While checking resouce allocation, I found the following comments:
    # All layers that have model filters must be spatially scheduled.

    Thus I think two adjacent conv layers will not be temporal scheduled in the same region. But my nn-dataflow result on Resnet50 is like that --' segment0=[ conv1_sched_seq=[0,0,0], pool1[0,0,1],conv2_0_a [0,1,0],conv2_0_b[0,1,1],conv2_0_c[0,1,2]];segment1= [conv2_br[1,0,0],conv2_0_res[1,0,1]]...' Does it means that [conv2_0_a ,conv2_0_b,conv2_0_c]can be temporal scheduled in the same region? Could you please give me more details on spatial&temproal schedule rules of inter-layer pipeline? Thanks.
  2. What's more, is only two adjacent conv layers exectued by different regions can use ALLO? That is, result like 'pool1[0,0,1],conv2_0_a [0,1,0]' should be pipelining without ALLO, right?
    Thank you very much.

total_ops of ConvLayer

Dear Professor Gao,

When calculate the total_ops of ConvLayer as showed below, I think it should be doubled since these are all MAC operations.

def ops_per_neuron(self):
# 2D convolution across all ifmap channels.
return self.hfil * self.wfil * self.nifm

cant run

i cant run python ./nn_dataflow/tools/nn_dataflow_search.py -h

the error is:
Traceback (most recent call last): File "./nn_dataflow/tools/nn_dataflow_search.py", line 24, in <module> from nn_dataflow.core import NNDataflow ImportError: No module named nn_dataflow.core

Clean installation on Ubuntu 20.04 fails

Hello! I'm trying to install nn_dataflow on an Ubuntu 20.04 distro with:

  • g++ 9
  • gcc 9
  • python 3.8.10
  • python3 3.9.10

I try to run the pytest via the python -m unittest discover command and no matter the circumstances, it always returns errors of this matter:

.............No valid schedule found for AlexNet.
No valid schedule found for AlexNet.
............................................................................SSM1
SSM2
SSM3
..................................................................................FFFF.................................................................................................
.......................................................................................................................................................................................
...............................................................................
======================================================================
FAIL: test_3d_mem (nn_dataflow.tests.tool_test.test_nn_dataflow_search.TestNNDataflowSearch)
With 3D memory.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andreas/Documents/tools/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 45, in test_3d_mem
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

======================================================================
FAIL: test_default_invoke (nn_dataflow.tests.tool_test.test_nn_dataflow_search.TestNNDataflowSearch)
Default invoke.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andreas/Documents/tools/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 40, in test_default_invoke
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

======================================================================
FAIL: test_no_dataflow (nn_dataflow.tests.tool_test.test_nn_dataflow_search.TestNNDataflowSearch)
No dataflow scheme found.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andreas/Documents/tools/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 53, in test_no_dataflow
    self.assertEqual(ret, 2)
AssertionError: 1 != 2

======================================================================
FAIL: test_default_invoke (nn_dataflow.tests.tool_test.test_nn_layer_stats.TestNNLayerStats)
Default invoke.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/andreas/Documents/tools/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_layer_stats.py", line 38, in test_default_invoke
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

----------------------------------------------------------------------
Ran 534 tests in 592.050s

FAILED (failures=4)

I first tried to install nn_dataflow in a custom conda environment, which failed. Then I tried installing it using the distro's default python software which is:

Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux

I installed the software using 2 different ways (well it's the same thing actually but anyway):

  1. via python setup.py install
  2. via pip install --user -e .

After every failure, I deleted the directory with the software and cloned it again, to avoid any prebuilt software subsystem. Nothing works so far. Any ideas?

is there any example to recover the results in [Yang16]?

Hi, thank you for this wonderful project.
I have a similar idea long ago util I found this project and the related paper.

I want to estimate the optimal dataflow in my custom network. However I don't know how to decide parameters such as '--op-cost / hier-cost / hop-cost '. Could you please provide an example for one of the cases in the nns folder?

Another issue is that in the paper [Yang16], it seems to take Diannao as the baseline, and this project takes the Eyeriss as the base architecture prototype?

How to run a single convolutional layer

Currently for Tetris we need to give network as a python file, which is then parsed to give the results. Is it possible to provide any single layer (can be conv, pool or fc) without providing information about other layers and obtain the results for just that layer?
Currently Tetris checks for mismatch between the current and preceding layers. Will the simulator work correctly if I remove those checks so that I can provide a single layer at a time and obtain results for that

Thanks

Unet

Tetris has a mechanism in which it checks the dimension of previous layers with respect to current layer and raises a mismatch if there is any. For Unet network, there is first decrease in feature maps and then an increase in feature maps which is not handled right now in Tetris. Is there a way to support Unet model for Tetris or do we need to add that support?

change the weights

I want to run resnet50 with my custom weights. I want to change the weights and see the results(cost, run time ,etc) in inference level. how can I do this?

tiling factors in the output log?

I am trying to understand the size of the feature maps being moved across the memory hierarchy, with the following setup.
python ./nn_dataflow_search.py --batch 1 --word 16 --array 12 14 --nodes 1 1 --gbuf 110592 --regf 458 --mem-type 2D --disable-bypass i o f --disable-interlayer-opt -t 10 -v vgg_net
These results are for the first conv layer of VGG16. The loop order scheme for DRAM-GBUF scheme seems to be 2 0 1 (Output reuse defined tertris paper).

How can i interpret tvals? what is tb, Output tile factors [tof, toy, tox], Input tile factors [tif, tiy, tix] across spatial and filter dimensions?
tb must be 1 when batch size is 1? Is it happening the same way in the below ouptut?

    "tvals": [
      [
        1, 
        64, 
        4
      ], 
      [
        1, 
        1, 
        1
      ], 
      [
        3, 
        1, 
        1
      ]
    ]

    "orders": [
      [
        2, 
        0, 
        1
      ], 
      [
        0, 
        1, 
        2
      ]
    ]


    "ti": [
      1, 
      1, 
      3
    ], 
    "to": [
      64, 
      1, 
      1
    ], 
    "tb": [
      4, 
      1, 
      1
    ], 
    "

How does it calculate the total time of a segment in a parallel manner?

Here is the code in the Class NNDataflowScheme(./core/nn_dataflow_scheme.py)
`

    self.sum_cost = 0 # Naive sum of all layer cost.
    self.sum_static_cost = 0
    
    self.sum_time = 0 # Naive sum of all layer time, used to adjust cost.

`
It seems like the total time of a segment is merely adding up the time consuming of each layer. I can't find the code about pipelining in a segment.
Do I miss something? If it is, could you please point out which the code is about pipelining the layers in a segment?

the caculation of unit pass

In the class MapStrategyEyeriss,
'dim_flpeset = PhyDim2(util.idivc(self.dim_lpeset.h, self.fold.h),util.idivc(self.dim_lpeset.w, self.fold.w))'
and flpesets in a unitpass = fold.h,
so,as far as i consider,lpeset means batch processing unit_pass fold.w times.
In function _calc_unitpass, I think the total accessed size of ITCN is the row size (FIL,IFM,OFM) in one PE × folded PE array size × the number of flpesets in a unitpass .
I don't understand why using lpeset caculate the accessed size of ITCN (eg:access[me.ITCN][de.FIL] = acclayer.wfil * self.dim_lpeset.size() * flpesets_per_unitpass) instead of flpesets,and Does it cause duplicates to multiply by fold.h?
Can you solve my problem ?
THANKS .

Broadcasting

As far as I understand, for processing the FC layers, the same input is shared among all the PEs. Each PE process this input with different weights and generate different outputs. Right?
If this is true, then each input can be broadcasted to all the PEs. However, in the code, when calculating the global buffer size [map_strategy.py, usize_gbuf = tuple(s * n for s, n in zip(sz_gbuf_unitpass, rcnt))] you assume that the same input should be replicated with the same number of PE array size. Is there any specific reason for doing this?

tool-test failed

Hi, there. I v just dived into this package, and i m trapped into an error while running python3 -m unittest test_nn_dataflow_search.py without any change in the repo. I am wondering why these 3 tests all failed. What should i do to solve this for further application? Thanks a lot.

ERROR is described as below.

FFF
======================================================================
FAIL: test_3d_mem (test_nn_dataflow_search.TestNNDataflowSearch.test_3d_mem)
With 3D memory.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vapor/tool/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 45, in test_3d_mem
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

======================================================================
FAIL: test_default_invoke (test_nn_dataflow_search.TestNNDataflowSearch.test_default_invoke)
Default invoke.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vapor/tool/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 40, in test_default_invoke
    self.assertEqual(ret, 0)
AssertionError: 1 != 0

======================================================================
FAIL: test_no_dataflow (test_nn_dataflow_search.TestNNDataflowSearch.test_no_dataflow)
No dataflow scheme found.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vapor/tool/nn_dataflow/nn_dataflow/tests/tool_test/test_nn_dataflow_search.py", line 53, in test_no_dataflow
    self.assertEqual(ret, 2)
AssertionError: 1 != 2

----------------------------------------------------------------------
Ran 3 tests in 0.619s

FAILED (failures=3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.