Git Product home page Git Product logo

sfpi's Introduction

SFPI: Tenstorrent SFPU programming interface

This repo contains SFPI, its associated tests including an SFPU emulator (complete for GS, partial for WH and not supported) and a TT enhanced RISC-V gcc which exposes intrinsics that generate SFPU instructions.

User Documentation

https://tenstorrent-metal.github.io/tt-metal/latest/tt_metal/apis/kernel_apis/sfpu/llk.html

Building

  1. Clone the sfpi repo and get a branch: git clone [email protected]:tenstorrent-metal/sfpi.git

  2. Initialize the submodules: git submodule update --init --resursive

The above will pull a bunch of submodules that aren't used (and takes a while), alternatively: git submodule update --init cd tt-gcc git submodule update --init

If you are working on gcc (probably in a submodule such as riscv-gcc), be sure to check out an appropriate branch. The gcc work isn't done on master, typically it is done on tt-rel/riscv-gcc-10.2.0. If you are just compiling, then don't bother with this, the --init above should put you at the right commit.

  1. Configure the compiler:
    export SFPI_ROOT=<path to sfpi top level>
    cd tt-gcc
    ./configure --prefix=$SFPI_ROOT/compiler -disable-multilib -with-abi=ilp32 -with-arch=rv32i

SFPI_ROOT must be an absolute path. Note: the install script and the steps below assume the path used above

  1. Build the compiler:

The compiler is based on: https://github.com/riscv-collab/riscv-gnu-toolchain see documentation there for installing packages needed to build gcc.

To build:

    make -j <n>

Note: incremental gcc builds are squirrelly, if you aren't sure how to do it it is best to build from scratch w/ a "make clean" at the tt-gcc level.

  1. Build the tests:
    cd $SFPI_ROOT/sfpi/tests
    make all

If this builds, the compiler at least exports the right intrinsics for use w/ the synced sfpi. If it fails to build, there is likely a sync error (submodule out of date) between what the compiler is exposing and what sfpi is calling.

The tests in the gcc directory just build some code to be sure gcc is sane. The tests in sfpi use the SFPI wrapper and are more complex.

  1. Run the tests
    cd sfpi
    make test

This will run the built x86/kernels and diff the results against a file in gold. That tests the simulator and sfpi. It will also build a few files and diff the assembly (.S files) against those in gold. This is likely brittle, but provides some sanity that the compiler and sfpi are doing at least what they used to do.

  1. Create a release
    cd $SFPI_ROOT
    bin/release.sh <path to release compiler>

Note: the release script assumes you are overwriting a release (it looks for a directory as a sanity check)

The above will copy lots of files and strip the results which spews tons of errors. This could be cleaned up, but seems to work (strip fails when it doesn't recognized the file type).

  1. Running the GCC test suite This is not required for typical build/release cycles, but should likely be done if the RISCV code paths are altered or any other signficant perturbation is made.

The following recipe comes from HelpRack w/ some modifications for issues I ran into.

Prerequisites The following programs should be installed on your system. * dejagnu, libglib2.0-dev, libfdt-dev, libpixman-1-dev, zlib1g-dev, libgtk-3-dev, expect, ninja (Note: I installed the above with conda)

8a) Build as above

8b) run make check build-sim -j12

This command also runs test suites which you can inspect in the "$SFPI_ROOT/tt-gcc/build-gcc-newlib-stage2/gcc/testsuite/gcc" directory. The files to inspect are "gcc.log" and "gcc.sum"

Preparing the system to support qemu emulation 8c) Create a file named "riscv32-unknown-elf-run" in the $SFPI_ROOT/compiler/bin directory with the following script:

#!/bin/bash
RISC_V_SYSROOT=$SFPI_ROOT/compiler/riscv32-unknown-elf
qemu-args=()
while [[ "$1" != "" ]]
do
    case "$1" in
    -Wq,*) qemu_args+=("$(echo "$1" | cut -d, -f2-)");;
    *) break;;
    esac
    shift
done
xlen="$(readelf -h $1 | grep 'Class' | cut -d: -f 2 | xargs echo |
sed 's/^ELF//')"
qemu-riscv$xlen -r 5.10 "${qemu_args[@]}" -L ${RISC_V_SYSROOT} "$@"

8d) Mark this file as executable chmod +x riscv32-unknown-elf-run The above will run some tests and put the results in gcc.log and gcc.sum under $SFPI_ROOT/build-gcc-newlib-stage2/gcc/testsuite/gcc

8e) Set PATH and LD_LIBRARY_PATH export PATH=$SFPI_ROOT/compiler/bin:$PATH export LD_LIBRARY_PATH=$SFPI_ROOT/compiler/lib:$LD_LIBRARY_PATH

8f) Create a test directory. You can place this directory wherever you want

8g) Copy the dejagnu configuration file from $SFPI_ROOT/tt-gcc/build-gcc-newlib-stage2/gcc/testsuite/gcc/site.exp to your "test" directory

8h) Edit "site.exp" and add the path to target_boards to this file. Append the following line to the end of the file (expand $SFPI_ROOT manually): lappend boards_dir "$SFPI_ROOT/tt-gcc/riscv-dejagnu/baseboards"

8i) To run all the tests in the GCC test suite runtest -target_board="riscv-sim/-march=rv32iy/-mabi=ilp32/-mcmodel=medlow" -tool gcc Or, eg, use "riscv-sim/-march=rv32iy/-mabi=ilp32/-mcmodel=medlow/-mgrayskull" which runs the SFPU passes and found a couple bugs when first run.

8j) To run a particular test suite, e.g., compile.exp, execute.exp etc. runtest -target_board="riscv-sim/-march=rv32iy/-mabi=ilp32/-mcmodel=medlow" -tool gcc execute.exp

8k) To run a single test file eg. gcc.c-torture/execute/fprintf-1.c runtest -target_board="riscv-sim/-march=rv32iy/-mabi=ilp32/-mcmodel=medlow" -tool gcc execute.exp=fprintf-1*

Debug tips:

  • use -verbose to see what the heck is going on
  • change $SFPI_ROOT/compiler/share/dejagnu/dg.exp "set keep 0" to "set keep 1" to keep the executable around
  • look in gcc/testsuite/lib/gcc-dg.exp for the OPTIONS lists to whittle down and run fewer variations

8l) After running a single test as in 8k, the log file will list the options to gcc that were used to run the test. Add the path to the compiler to compile just that single case. Some tests are compilation tests and this will be sufficient, other tests run filters on output or run the simulator to generate a result. These are more complicated and take some digging.

8m) check "gcc.log" and "gcc.sum" to view the log and summary of the executed tests, respectively.

8n) check "testrun.log" and "testrun.sum" to view the log and summary of the runtest command.

sfpi's People

Contributors

pgkeller avatar tt-mnijjar avatar urnathan avatar

Stargazers

xsw avatar Davor Capalija avatar Sean Jensen-Grey avatar

Watchers

Davor Capalija avatar Jasmina Vasiljevic avatar

sfpi's Issues

implement rand for WH/BH

Issuing an sfpmov from register 9 using mod value 8 creates a random int
This isn't exposed through the sfpi wrapper
This may work easily in the wrapper, but not sure w/o looking if compiler/assembler support mod value 9 or need to be updated.
Code would look something like the following (as an API level workaround):

vInt rand_gen = l_reg[LRegs::LReg 9];
vInt rand = __builtin_rvtt_sfpmov(rand_gen.get(), 8);

investigate unsigned comparisons

I believe there are tests for this and I believe sfpu supports it, but a user reported an issue w/ comparing a unsigned vector against a signed integer which behaved as if it was a signed compare.

crash w/ function argument passed into conditional

The code below:

template <bool APPROXIMATION_MODE>
inline void calculate_softplus_body(vFloat beta, vFloat beta_reciprocal, vFloat threshold) {
    vFloat a = dst_reg[0];
    vFloat a_beta = a * beta;
    v_if(a_beta < threshold) {
        exp_init<APPROXIMATION_MODE>();
        a = calculate_exponential_body<APPROXIMATION_MODE>(a_beta) + 1.0f;

        dst_reg[0] = a;
        _calculate_log_body_<false>(0);
        a = beta_reciprocal * dst_reg[0];
    }
    v_endif;
    dst_reg[0] = a;
}

generates:

Illegal rvtt builtin found in conditional tree: sfpxloadi
during GIMPLE pass: rvtt_expand
/localdev/esmal/tt-metal/tt_metal/hw/ckernels/grayskull/metal/llk_api/llk_sfpu/ckernel_sfpu_softplus.h: In function 'calculate_softplus_body.constprop.isra':
/localdev/esmal/tt-metal/tt_metal/hw/ckernels/grayskull/metal/llk_api/llk_sfpu/ckernel_sfpu_softplus.h:39:13: internal compiler error: in process_tree_node, at gimple-rvtt-expand.c:622
   39 | inline void calculate_softplus_body(vFloat beta, vFloat beta_reciprocal, vFloat threshold) {
      |             ^
0x7f607ab8b082 __libc_start_main
        ../csu/libc-start.c:308

My guess is that the fn is not inlined and that the check for the conditional happens before the error checking that the value isn't loaded from memory - we should error out w/ "can't load sfpu vector from memory" but instead we crash. Needs a little investigation

SFPNOP is not issued after required instructions in certain cases.

The compiler is not issuing SFPNOP instructions at least after SFPMUL instructions in certain scenarios even when they are needed.

Example scenario :

Architecture : wormhole_b0
OS : Ubuntu 20.04.6 LTS

When the power_tile function is called it eventually calls calculate_power_iterative which gets inlined. The function has a 2 level loop. The outer loop gets unrolled.

template <bool APPROXIMATION_MODE,int ITERATIONS = 8>
inline void calculate_power_iterative(const uint exponent)
{
    #pragma GCC unroll 8
    for (int d = 0; d < 8; d++)
    {
        vFloat in = dst_reg[0];
        vFloat result = 1.0f;
        for (uint i = 0; i < exponent; i++) {
            result *= in;
            // TTI_SFPNOP;
        }
            dst_reg[0]=result;
        dst_reg++;
    }
}

When the power_tile function is itself called within a loop, during the unrolling and inlining SFPNOP instructions are not issued after SFPMUL instruction in the last 5 out of the total 8 unrolled instances leading to wrong result.

    7834:	00ee0863          	beq	t3, a4, 7844 <kernel_launch()+0x38c>
    7838:	18006402          	sfpmul	L0, L0, L1, L9, 0
    783c:	00170713          	addi	a4, a4, 1
    7840:	ff5ff06f          	j	7834 <kernel_launch()+0x37c>
    7844:	c8030001          	sfpstore	0, L0, 0, 3

How to reproduce:

Architecture : wormhole_b0
OS : Ubuntu 20.04.6 LTS

In the branch
https://github.com/tenstorrent/tt-metal/tree/amahmud/power_debug,
running the python test
pytest ./tests/tt_eager/python_api_testing/unit_testing/misc/test_moreh_norm.py
will compile the kernel ./ttnn/cpp/ttnn/deprecated/tt_dnn/op_library/moreh_norm/moreh_norm_h/kernels/moreh_norm_h_kernel.cpp

This kernel calls power_tile within a loop that will run only once and produces redundant values in the place of padded zeroes of the output tensor due to the mentioned issue. You can see the missing SFPNOPS by generating assembly code for the file in
/tt-metal/built/<generated-number>/kernels/moreh_norm_h_kernel/<generated-number>/trisc1/trisc1.elf

In my branch I have already generated and dumped the assembly code and elf files in the folder /tt-metal/assembly/ with appropriate names

The code in the moreh_norm_h_kernel.cpp will call power_tile within a loop, but if you comment that part and uncomment the code above which does the same thing but not being within a loop, the issue will not be encountered.

void MAIN {
      
    // OMITTED CODE  

/*
    // **The following code produces the correct output.**
    
    // DPRINT_PACK( DPRINT << "EXECUTE POWER \n";)
    ACQ();
    cb_wait_front(cb_x, onetile);
    cb_reserve_back(cb_y, onetile);

    // copy
    copy_tile_init();
    copy_tile(cb_x, 0, dst0);

    // power
    power_tile_init();
    power_tile(dst0, p);

    pack_tile(dst0, cb_y);

    cb_push_back(cb_y, onetile);

    cb_pop_front(cb_x, onetile);
    REL();
*/
    
    **// The loop below runs only once, but unlike the code above will miss issuing SFPNOPs
    // In the correct places.** 
    
    // DPRINT_PACK(DPRINT << "num_cols_per_core " << num_cols_per_core << "\n"; )
    // DPRINT_PACK(DPRINT << "Ht " << Ht << "\n"; )
    // This loop only runs once.
    for (uint32_t col_idx = 0; col_idx < num_cols_per_core; ++col_idx) {
        // This loop only runs once.
        for (uint32_t row_idx = 0; row_idx < Ht; ++row_idx) {
            // DPRINT_PACK( DPRINT << "EXECUTE POWER \n";)
            // The following code is identical to the above but produces incorrect output.
            ACQ();
            cb_wait_front(cb_x, onetile);
            cb_reserve_back(cb_y, onetile);

            // copy
            copy_tile_init();
            copy_tile(cb_x, 0, dst0);

            // power
            power_tile_init();
            power_tile(dst0, p);

            pack_tile(dst0, cb_y);
            cb_push_back(cb_y, onetile);

            cb_pop_front(cb_x, onetile);
            REL();
        }
    }
}  // void MAIN
}

Without loop (gives correct result and has SFPNOP after every SFPMUL)
power_no_loop_correct.txt

With loop (does not have SFPNOP after 5 instances of SFPMUL)
power_loop_bug_asm.txt

This issue has been identified from issue : tenstorrent/tt-metal#7928

optimize replay insn

when sequences of insns are found that exceed the longest HW supported sequence, favor the last sequence in the kernel (currently favors the first)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.