Git Product home page Git Product logo

cvfpu's Introduction

FPnew - New Floating-Point Unit with Transprecision Capabilities

Parametric floating-point unit with support for standard RISC-V formats and operations as well as transprecision formats, written in SystemVerilog.

Maintainers: Pasquale Davide Schiavone [email protected], Pascal Gouedo [email protected]
Authors: Stefan Mach [email protected], Luca Bertaccini [email protected]

Features

The FPU is a parametric design that allows generating FP hardware units for various use cases. Even though mainly designed for use in RISC-V processors, the FPU or its sub-blocks can easily be utilized in other environments. Our design aims to be compliant with IEEE 754-2008 and provides the following features:

Formats

Any IEEE 754-2008 style binary floating-point format can be supported, including single-, double-, quad- and half-precision (binary32, binary64, binary128, binary16). Formats can be defined with arbitrary number of exponent and mantissa bits through parameters and are always symmetrically biased. Multiple FP formats can be supported concurrently, and the number of formats supported is not limited.

Multiple integer formats with arbitrary number of bits (as source or destionation of conversions) can also be defined.

Operations

  • Addition/Subtraction
  • Multiplication
  • Fused multiply-add in four flavours (fmadd, fmsub, fnmadd, fnmsub)
  • Division1,2
  • Square root1,2
  • Minimum/Maximum3
  • Comparisons
  • Sign-Injections (copy, abs, negate, copySign etc.)
  • Conversions among all supported FP formats
  • Conversions between FP formats and integers (signed & unsigned) and vice versa
  • Classification

Multi-format FMA operations (i.e. multiplication in one format, accumulation in another) are optionally supported.

Optionally, packed-SIMD versions of all the above operations can be generated for formats narrower than the FPU datapath width. E.g.: Support for double-precision (64bit) operations and two simultaneous single-precision (32bit) operations.

It is also possible to generate only a subset of operations if e.g. divisions are not needed.

1Some compliance issues with IEEE 754-2008 are currently known to exist for the PULP DivSqrt unit (Rounding mismatches have been reported in GitHub issues. This can lead to results being off by 1ulp, and the inexact flag not being properly raised in these cases as well)
2Two DivSqrt units are supported: the multi-format PULP DivSqrt unit and a 32-bit unit integrated from the T-Head OpenE906. The PulpDivsqrt parameter can be set to 1 or 0 to select the former or the latter unit, respectively.
3Implementing IEEE 754-201x minimumNumber and maximumNumber, respectively

Rounding modes

All IEEE 754-2008 rounding modes are supported, namely

  • roundTiesToEven
  • roundTiesToAway
  • roundTowardPositive
  • roundTowardNegative
  • roundTowardZero

Status Flags

All IEEE 754-2008 status flags are supported, namely

  • Invalid operation (NV)
  • Division by zero (DZ)
  • Overflow (OF)
  • Underflow (UF)
  • Inexact (NX)

Getting Started

Dependencies

FPnew currently depends on the following:

These two repositories are included in the source code directory as git submodules, use

git submodule update --init --recursive

if you want to load these dependencies there.

Consider using Bender for managing dependencies in your projects. FPnew comes with Bender support!

Usage

The top-level module of the FPU is called fpnew_top and can be directly instantiated in your design. Make sure you compile the package fpnew_pkg ahead of any files making references to types, parameters or functions defined there.

It is discouraged to import all of fpnew_pkg into your source files. Instead, explicitly scope references into the package like so: fpnew_pkg::foo.

Example Instantiation

// FPU instance
fpnew_top #(
  .Features       ( fpnew_pkg::RV64D          ),
  .Implementation ( fpnew_pkg::DEFAULT_NOREGS ),
  .TagType        ( logic                     )
) i_fpnew_top (
  .clk_i,
  .rst_ni,
  .operands_i,
  .rnd_mode_i,
  .op_i,
  .op_mod_i,
  .src_fmt_i,
  .dst_fmt_i,
  .int_fmt_i,
  .vectorial_op_i,
  .simd_mask_i,
  .tag_i,
  .in_valid_i,
  .in_ready_o,
  .flush_i,
  .result_o,
  .status_o,
  .tag_o,
  .out_valid_o,
  .out_ready_i,
  .busy_o
);

Documentation

More in-depth documentation on the FPnew configuration, interfaces and architecture is provided in docs/README.md.

Issues and Contributing

In case you find any issues with FPnew that have not been reported yet, don't hesitate to open a new issue here on Github. Please, don't use the issue tracker for support questions. Instead, consider contacting the maintainers or consulting the PULP forums.

In case you would like to contribute to the project, please refer to the contributing guidelines in docs/CONTRIBUTING.md before opening a pull request.

Repository Structure

HDL source code can be found in the src directory while documentation is located in docs. A changelog is kept at docs/CHANGELOG.md.

This repository loosely follows the GitFlow branching model. This means that the master branch is considered stable and used to publish releases of the FPU while the develop branch contains features and bugfixes that have not yet been properly released.

Furthermore, this repository tries to adhere to SemVer, as outlined in the changelog.

Licensing

FPnew is released under the SolderPad Hardware License, which is a permissive license based on Apache 2.0. Please refer to the SolderPad license file for further information.

The T-Head E906 DivSqrt unit, integrated into FPnew in vendor/opene906, is reseased under the Apache License, Version 2.0. Please refer to the Apache 2.0 license file for further information.

Publication

If you use FPnew in your work, you can cite us:

FPnew Publication

@article{mach2020fpnew,
  title={Fpnew: An open-source multiformat floating-point unit architecture for energy-proportional transprecision computing},
  author={Mach, Stefan and Schuiki, Fabian and Zaruba, Florian and Benini, Luca},
  journal={IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
  volume={29},
  number={4},
  pages={774--787},
  year={2020},
  publisher={IEEE}
}

Acknowledgement

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 732631.

For further information, visit oprecomp.eu.

OPRECOMP

cvfpu's People

Contributors

aklsh avatar andreaskurth avatar bluewww avatar davideschiavone avatar fabianschuiki avatar flaviens avatar florent-gwt avatar gregdavill avatar gurkaynak avatar huettern avatar lucabertaccini avatar meggiman avatar michael-platzer avatar mikeopenhwgroup avatar mp-17 avatar msfschaffner avatar owenchj0 avatar pascalgouedo avatar stmach avatar zarubaf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cvfpu's Issues

what's the meaning for CPKAB and CPKCD operation?

In the README.md, CPKAB and CPKCD operation is introduced, but I cannot get the meaning of these operations, or is there some difference between CPKAB and CPKCD? or op_mod is 0 or 1?

I read the rtl code(fpnew_opgroup_multifmt_slice.sv and fpnew_cast_multi.sv), and I tried to run some simulations to help understand,
however, when

  1. src_fmt is fp64, dst_fmt is fp16, op=fpnew_pkg::CPKAB, op=0
  2. src_fmt is fp64, dst_fmt is fp16, op=fpnew_pkg::CPKAB, op=1
  3. src_fmt is fp64, dst_fmt is fp16, op=fpnew_pkg::CPKCD, op=0
  4. src_fmt is fp64, dst_fmt is fp16, op=fpnew_pkg::CPKCD, op=1
    I got the same result.

So is there some detail description for CPKAB and CPKCD operation?
thanks.

FDIV and FSQRT Latency

Hi Team,
Is there any data about latency of operation for FDIV and FSQRT? Does the latency depends on the data width?

-Pranay

Benchmarking of PULP FPU against Whisper ISS

To whom it may concern,

We are from Microelectronics Research Lab (MERL), based in Pakistan working on developing RISC-V based ASICs and SoCs.

Currently we were working on developing an Indigenously design RISC-V based Floating Point Unit (AI-FPU). Initially we were using PULP FPU as a benchmark for the verification of our designed AI-FPU.

However, during the verification process we encountered few cases where the PULP FPU was not 100% compatible with Whisper ISS. A glimpse of those errors is shown in the figure attach.
Screenshot from 2022-09-02 16-40-10
Screenshot from 2022-09-02 16-40-33
Screenshot from 2022-09-02 16-42-16

Thank you,

Regards:
Engr Hamza Shabbir
Research Associate
MERL

Synthesizable

Can this code be synthesized as it is in Synopsys DC.If not does it needs to be changed to synthesizable constructs? Thanks in advance.

DC Synthesis for cvfpu (fpnew) occurs fatal error

When I tried to synthesis FPnew using Design Compiler (the top module is fpnew_top or fpnew_opgroup_block), the following error will always appear: "The tool has just encountered a fatal error", "Fatal: Internal system error, cannot recover".
image

I looked for some corresponding solutions, but they didn't help. (like: https://www.edaboard.com/threads/fatal-internal-system-error-cannot-recover.203320/)

Does anyone know the reason? For example, is the DC version I am using too low?

Unable to run Floating point Unit

Hi All,

I downloaded this repository today. Can someone guide how to run the simulation?

Also, do I need a Pulp gnu toolchain to run this, or Risc-V gnu toolchain will work?

if it is useful, I am using cv32 core

Simple As Possible (SAP-FPU)

Hello
@lucabertaccini @MikeOpenHWGroup @pascalgouedo @JeanRochCoulon @jquevremont

We are from Microelectronics Research Lab (MERL-DSU). MERL is an non-profitable organization with an ambitious plan to lead Microelectronics Research & Development in Pakistan. MERL is working with a Vision to enable Pakistan to become a recognized global player in the microelectronics industry. MERL's Mission is to train Undergraduate Students of Pakistan in the field of ASIC designing.

Our RTL Team has design a Simple as Possible Floating Point Unit (SAP-FPU) based on IEEE-754 format. This is the initial version which contain 22 RISC-V F-Extension Instructions. The design files are available on our github repository.

https://github.com/merldsu/SAP_FPU.git

Regards:
MERL-DSU

Narrowing converting instruction mismatch issue for number infinity

If a number is infinity in FP64, after converting it into FP32, will it be infinity or of max magnitude? According to spike, it is infinity but according to RTL implementation, it is of max magnitude.Actually, overflow and infinity are handled together i.e saturating it or making it max magnitude. I think it is fine to handle overflow like this, but for infinity, there should be another case i.e infinity should remain infinity.

[Bug] Incorrect Accumulation of ‘OF’ Flag in fflags After Executing fsqrt.d on Infinity

Bug Description
Hi,

When executing the fsqrt.d instruction on a double-precision floating-point value representing positive infinity (0x7ff0000000000000), the Overflow (OF) flag in the fflags register is erroneously set. According to the IEEE 754 standard and the RISC-V specification, the fsqrt.d operation should not lead to an overflow situation when the input is infinity. This also results in inconsistency with Spike's output

Expected Behavior:
The OF flag in the fflags register should remain clear (i.e., not set) after performing a square root operation on an infinite value, as the result is well-defined and should be positive infinity.

Actual Behavior:
The OF flag is set in the fflags register, indicating an overflow, which contradicts the expected behavior defined by the IEEE 754 standard and RISC-V floating-point operation guidelines.

Steps to Reproduce:

Load a double-precision floating-point register with the value 0x7ff0000000000000 (positive infinity).
Execute the fsqrt.d instruction on this register.
Check the fflags register; observe that the OF flag is incorrectly set.

`fsgnjx.s` instruction is misbehaving with operand-b as infinity

Found a bug in a floating point unit. fsgnjx.s instruction is misbehaving.

Ideally fsgnjx.s copy a signed value from one register to another register, while modifying the sign bit based on the sign from another value.

Examples:

FSGNJ.S  f2,f5,f6 	        # f2 =  sign(f6)  * |f5|
FSGNJN.S f2,f5,f6		# f2 = -sign(f6)  * |f5|
FSGNJX.S f2,f5,f6		# f2 =   sign(f6) *  f5

In case, the f6(rs2) register is infinity and the f5(rs1) register is any number, the result from spike(ISS) is the XOR of a sign of f6 (i.e. positive) with the sign bit of f5(rs1) register for FSGNJX.S instruction but the result from FPU is a NaN which is incorrect.

References

Screenshots of logs from the spike and FPU design are attached below for a reference

image

How to perform Div/Sqrt operation using the DIVSQRT unit

Hello,

I established a testbench that instanciates the fpnew_top, and I'm trying to understand how I can control the interface to feed the input operands and get the result back.

I use the following instructions inside an initial block. With MULADD operations I get the correct results, but when I change fpu_op to fpnew_pkg::SQRT or fpnew_pkg::DIV I get all zeroes at the output.

I use the same Implementation and Feature parameters used inside the EX stage of RI5CY. Simulation is performed using ModelSim 10.2c. Do you have any idea why i don't get the results for Div and Sqrt? What of ModelSim does work with your SystemVerilog testbenches?

        // The input operands
        apu_operands_i[0] = 32'b00111111100000000000000000000000; // 1.0
        apu_operands_i[1] = 32'b01000000001000000000000000000000; // 2.5
        apu_operands_i[2] = 32'b00111111100000000000000000000000; // 1.0

        // Operation to be executed
        fpu_op        = fpnew_pkg::ADD;
        fpu_vec_op    = 1'b0;
        fpu_op_mod    = 1'b0; // fpu_op = ADD && fpu_op_mod = 0 ==> ADDITION 
                                           // fpu_op = ADD && fpu_op_mod = 1 ==> SUBTRACTION 

        // Input/Output formats
        fpu_int_fmt   = fpnew_pkg::INT32;
        fpu_src_fmt   = fpnew_pkg::FP32;
        fpu_dst_fmt   = fpnew_pkg::FP32; // By default, dest types = src type. fpnew_pkg::FP32;
        fp_rnd_mode   = fpnew_pkg::RNE;

        // Ready/Valid Interface
        in_valid_i = '1;

I have also got several warnings, that I just suppressed, could these affect the results of the simulation?

vlog-2583: [SVCHK] - Some checking for conflicts with always_comb and always_latch variables not yet supported. Run vopt to provide additional design-level checks.

Thank you very much in advance.
Best regards,

Classify operation output giving incorrect value

I am running the CLASSIFY operation on the following configuration:

FEATURES = '{
Width: 64,
EnableVectors: 1'b1,
EnableNanBox: 1'b1,
FpFmtMask: 5'b11111,
IntFmtMask: 4'b1111};

IMPLEMENTATION = '{
PipeRegs: '{default: 0},
UnitTypes: '{'{default: fpnew_pkg::MERGED},
'{default: fpnew_pkg::MERGED},
'{default: fpnew_pkg::PARALLEL},
'{default: fpnew_pkg::MERGED}},
PipeConfig: fpnew_pkg::BEFORE};

I am providing inputs in a two FP32 element format and according to the RISC-V spec a classify instruction is supposed to give a 10-bit mask output therefore in this case I was expecting two 10-bit masks one for each of the input elements however I am getting a strange result value as seen in the image attached.

Classify snip

The inputs I have provided are positive normal numbers therefore I would expect two identical masks with hex values "0x0000004000000040" but the output is "0x0000000000004242".

synthesis: fpnew_divsqrt_multi register retiming

Hi, I'm trying to push the frequency a bit higher and find that timing doesn't improve beyond a point no matter how many input/output registers I add to the divsqrt unit. There seems to be some sequential loops inside the following module.

i_fpnew_divsqrt_multi/i_divsqrt_lei/nrbd_nrsc_U0/control_U0/

Is this a known issue and is there a way around it?

Thanks,
PaulK

Errors on berkeley testfloat-3e

Hi pulp-team,
I executed the berkeley testfloat-3e in the verilator model of ri5cy core with FPU+DIV_SQRT and here are the logs of all tests of unary and binary functions. I know that it's quite common to find errors through this set of tests on any CPU but it might be helpful to check IEEE 754 compliance and what you want/or not to be compliant. Logs attached =)

Ps.: it was used the latest master branches on fpnew/riscv core (...e1910/...813a7)
Ps2.: testfloat/softfloat were compile with march:rv32imafc / mabi:ilp32f / O2

log_testfloat_3e_unary_riscv_none_embedded_ri5cy_verilator_model.txt
log_testfloat_3e_binary_riscv_none_embedded_ri5cy_verilator_model.txt

Mismatching Calculations

I have been using the Imperas RISV Tests included with riscvOVPsim for compliance testing and having some queries on the F ISA extension tests.

OVPSim : https://github.com/riscv-ovpsim/imperas-riscv-tests
F ISA test download page: https://www.ovpworld.org/library/wikka.php?wakka=riscvOVPsimPlus

These tests from Imperas are the updated RISCV Compliance tests and the fpnew is have some mismatches on the FDIV, FMUL and FSQRT tests.

Here are the mismatching calculations:
FPNEW_MISMATCH_COMPLIANCE_V2.xlsx

I have only seen F ISA extension tests pass with the models (e.g. C/BFM), so this suggests there is a mismatch between the RTL and models.

Can you comment on my findings?

[BUG] Incorrect Result from 'fdiv' Operation Leading to Negative Infinity Instead of Smallest Negative Number

Bug Description

When performing floating-point division using the fdiv.d instruction in CVA6, under specific conditions where the expected result is the smallest representable negative double-precision floating-point number (0xffefffffffffffff), CVA6 incorrectly returns negative infinity (0xfff0000000000000).

Steps to Reproduce:

  1. Load fa7 with 0xc024000000000000.
  2. Load fs3 with 0x00000000000002cc.
  3. Execute the instruction: fdiv.d ft6, fa7, fs3.
  4. Observe the resulting value in ft6.

The log is as follows::

core   0: 0x00000000800052b2 (0x233987d3) fsgnj.d fa5, fs3, fs3
1 0x00000000800052b2 (0x233987d3) f15 0x00000000000002cc
core   0: 0x00000000800052b6 (0x23188853) fsgnj.d fa6, fa7, fa7
1 0x00000000800052b6 (0x23188853) f16 0xc024000000000000
core   0: 0x00000000800052ba (0x1b38f353) fdiv.d  ft6, fa7, fs3
1 0x00000000800052ba (0x1b38f353) f 6 0xfff0000000000000

See: openhwgroup/cva6#2060

Is it possible to do BF16*int8 + int8 using this IP ? #103

Hi all,
I need an FMA block that can do BF16*int8+int8. Is there a way to configure this option in the IP? If not, what are the minimum conversions I can do from int8 (actually int9) to get the FMA to work in this IP?
Any suggestions, or guidance would be appreciated.

-K

DIVSQRT vectorial lane synchronization

In the case where some lanes experience faster or slower execution than lane 0, the pipeline can become unsynchronized.

This can currently only happen if some lanes in the vectorial DIVSQRT unit handle special cases while others don't, but could in theory happen with any variable-latency unit (which we currently don't have).

Error in Synthesis: Constant Value required

Manually transferring this issue from here:

openhwgroup/cva6#577 (comment)

When I am trying to synthesize the design, It gives me an error in this function; any_enabled_multi in fpnew_opgroup_block.sv
Error: ../cva6/src/fpu/src/fpnew_opgroup_block.sv:81: Constant value required. (ELAB-922)
image
Here is the original function in fpnew_pkg.sv:
image

Exception Flag handling for Underflow

As per IEEE754-2008, Section 7.5, Exact subnormal results should not raise the underflow exception flag.

Basically all operations that can signal UF are affected.

FPU Configuration Discussion

FPU Configuration Discussion

There are plenty of configuration options for the FPU, almost all of which currently are passed to the top level module as parameters.
This leads to a cumbersome layout and amount of parameters, since there are options for every operation and format (the number of which can be redefined in the package).

This issue aims at finding a satisfactory solution for configuring the design. Initial considerations are:

Everything through parameters (current state)

Currently, (almost) all configuration is done through parameters. The set of available FP and int formats are set up in the package and can then be enabled/disabled via parameters.

Pros

  • Full control of design-space without changing RTL for different implementation options (like varying number of pipeline stages, enabling/disabling formats or operations etc.)
  • Multiple versions of the FPU within one design / project easily possible - differentiated through parameter list.

Cons

  • Many parameters:
    • either a bulky array-of-arrays style parametrization (current, module interface independent of number of formats supported)
    • or a huge list of options (module interface dependent on the number of formats defined in the pkg)
  • Large amount of parameters are unrolled into long and difficult to manage module names in synthesis
  • Most of these parameters are not modified anymore after an initial exploration

Everything from the package

Alternatively, all configuration could be done through parameters in the package

Pros

  • No parameters cluttering up the modules

Cons

  • Each configuration change requires modifying the RTL
  • Only one version of the package (= FPU configuration) possible in the design
  • Different configurations of FPU are not discernible through module name - alignment issues more easily possible if outdated pre-compiled design and changed RTL (in case of parameter change) are combined

A mix of the two

It might make sense to reduce the number of parameters to just the most frequently modified ones. However identifying these is difficult and cons from both options might need to be dealt with..

Bug report: fdiv.s taking two NaN may return a valid number and is not always the consequence of a wrong rounding

Hi there!

Overview

While this issue showed that fdiv can return a result off by one and therefore return NaN instead of infinity,

I found that fdiv.s may return wrong results for NaN inputs, where the output is not just off by one.
Hence, this is another bug than the reported off-by-one. I do not know at the moment whether the root cause is shared.

Snippet

  .section ".text.init","ax",@progbits
  .globl _start
  .align 2
_start:

  # Enable the FPU
  li t0, 0x2000
  csrs mstatus, t0
  csrw	fcsr,x0

  la t0, .fdata0
  la t1, .fdata1
  fld ft0, (t0)
  fld ft1, (t1)

  fdiv.s ft2, ft0, ft1, rup

infinite_loop:
  j infinite_loop

.section ".fdata0","ax",@progbits
  .8byte 0xeed295ee2a0a6df4 # nan
.section ".fdata1","ax",@progbits
  .8byte 0x0707b7830687b703 # nan

We get ft2=0xffffffff63028f6f = 2.40841e+21, while the expected result is 0xffffffff7fc00000=nan, validated with spike.

Note

Interestingly, the result is also wrong with cva6 version 17743bc7120f1eb24974e5d7eb7f519ef53c4bdc, but with a different value: 0x83028f70 = -3.83682e-37.

Thanks!
Flavien

FDIV FSQRT Rounding Mode Calculation Mismatches

FDIV and FSQRT rounding mode calculation mismatches found when comparing the results from FPnew to a RocketChip RISCV IMAF processor.

I have attached a few examples of calculation mismatches found:
FDIV FSQRT Rounding Mode Calculation Mismatches .xlsx

Note: I have added the solution from issue #47 since it fixed rounding mode issues found when testing against the RISCV Compliance tests, and I have found there are less rounding mode issues detected with it included.

Can you comment on my findings?

[BUG] Unexpected Rounding Behavior During fsqrt.d Execution

Bug Description

In some rounding mode, precision errors occur when calculating fsqrt.d. And the results differ from those of Spike.
Testing has shown that RDN, RUP, RMM and DYN all experience rounding errors.

How to reproduce and the logs are as follows:
Example:
Initialize fa7 with 0x402e000000000000 and execute fsqrt.d ft10, fa7.
Under RDN, DYN Mode:

Spike results are: ft10 = 0x400efbdeb14f4eda
CVA6 results are: ft10 = 0x400efbdeb14f4ed9

Under RUP , RMM Mode:
Spike results are: ft10 = 0x400efbdeb14f4ed9
CVA6 results are: ft10 = 0x400efbdeb14f4eda

Moreover:

The vulnerability of the rounding mode mentioned above also occurs in the fdiv instruction.

Example:
When executing the instruction fdiv.d fs9, fs10, fa4 with the following register values:

fs10 = 0x41ddc16575c00000
fa4  = 0x41e0000c09000000

The results differ between CVA6 and Spike simulator as follows:
Round Toward Zero (RTZ) Mode:
CVA6 Result: 0x3fedc14f1407f47f
Spike Result: 0x3fedc14f1407f47e (correct value as verified)
Round Up (RUP), Round to Nearest, Tie to Max Magnitude (RMM), and Dynamic (DYN) Rounding Modes:
CVA6 Result: 0x3fedc14f1407f47e
Spike Result: 0x3fedc14f1407f47f

See: openhwgroup/cva6#2057

Underflow flag not set at subnormal/normal boundary

Background

I've been running functional tests/validation on this fpu, by running test vectors generated by testfloat. http://www.jhauser.us/arithmetic/TestFloat.html

I've noticed a class of bugs around the underflow flag. Here are some errors produced by testfloat:

Errors found in f32_mul, rounding near_even:
+01.000000  +7E.7FFFFF  => +01.000000 ....x  expected +01.000000 ...ux
+01.000000  -7E.7FFFFF  => -01.000000 ....x  expected -01.000000 ...ux
+01.7FFFFF  +7E.000000  => +01.000000 ....x  expected +01.000000 ...ux
 
Errors found in f32_mulAdd, rounding min:
+54.61FFFE  -02.70001F  -00.7FFFFF
=> -01.000000 ....x  expected -01.000000 ...ux
+00.000001  -7D.7FBFFF  -00.7FFFFF
=> -01.000000 ....x  expected -01.000000 ...ux

This effects the FMA operations using the Multiply operation, (MUL, F[N]MADD, F[N]MSUB). And the DIV operation (Using t-head div/sqrt)
This bug shows up for all rounding modes except RTZ.

Issue

The fact that this does not effect the RTZ rounding mode led me to this statement in the spec:
image

This bug is the corner case, where the final result is +/-b^emin, but the unbounded exponent range lies strictly between +/-b^emin. In this case it appears that the underflow flag should be set, which is what happens in softfloat, but not happening in the FMA or t-head DIV unit.

In RTZ mode you can never hit the case where you round from a subnormal result into a normal result.

[BUG] Incorrect Accumulation of ‘OF’ Flag in fflags After Executing fsqrt.d on Infinity

Bug Description

When executing the fsqrt.d instruction on a double-precision floating-point value representing positive infinity (0x7ff0000000000000), the Overflow (OF) flag in the fflags register is erroneously set. According to the IEEE 754 standard and the RISC-V specification, the fsqrt.d operation should not lead to an overflow situation when the input is infinity. This also results in inconsistency with Spike's output

Expected Behavior:
The OF flag in the fflags register should remain clear (i.e., not set) after performing a square root operation on an infinite value, as the result is well-defined and should be positive infinity.

Actual Behavior:
The OF flag is set in the fflags register, indicating an overflow, which contradicts the expected behavior defined by the IEEE 754 standard and RISC-V floating-point operation guidelines.

Steps to Reproduce:

  1. Load a double-precision floating-point register with the value 0x7ff0000000000000 (positive infinity).
  2. Execute the fsqrt.d instruction on this register.
  3. Check the fflags register; observe that the OF flag is incorrectly set.

EDIT: A PR has been submitted pulp-platform/fpu_div_sqrt_mvp#25

EDIT: And See: openhwgroup/cva6#2058

Bug report: wrong sign in fmul.d with RDN rounding mode

Hi there!

I found a bug in FPnew (bug found through Ariane).

Bug

The operand_c should not be signed for floating multiplications using RDN.

Observable consequences

The floating multiplication gives wrong results for some inputs with the RDN rounding mode.

For example, the pseudo assembly code below executes incorrectly:

ft0 = 0x0 # corresponds to value +0.0
ft1 = 0x00000000dc98da8a # corresponds to value +1.82854e-314

fmul.d ft2, ft1, ft0

In the end of the snippet:

  • We expect the result to be 0x0, i.e., +0.0
  • We get the result 0x8000000000000000, i.e., -0.0

Related issues

This seems to be the underlying cause of the observations made in openhwgroup/core-v-verif#54, openhwgroup/core-v-verif#63, part of openhwgroup/core-v-verif#68, this external issue in core-v-verif and this external issue in cv32e40p.

Is it possible to do BF16*int8 + int8 using this IP ?

Hi,
I need an FMA block which can do BF16*int8+int8. Is there a way to configure this option in the IP? If not, what are the minimum conversions I can do from int8 (actually int9) to get the FMA to work in this IP?
Any suggestions, guidance would be appreciated.

fcvt.s.d gives wrong answer

I tried to run fcvt.s.d to convert a small double (2.386004908190509275e-40 which is 0x37b4c8f800000000) into single, but it gives inf (0x7f800000) instead of 2.38600491e-40 (0x0002991f).

Bug report: FSQRT of non-canonical NaN gives valid large floats

Hi there!

fsqrt was known to produce some off-by-one errors, and may return 1 on canonical NaN.

This snippet shows that additionally to these known problems, fsqrt.s can return large valid floating-point numbers instead of NaNs, when the input is a non-canonical NaN.
Hence, this is another bug than the reported off-by-one. I do not know at the moment whether the root cause is shared.

  .section ".text.init","ax",@progbits
  .globl _start
  .align 2
_start:

  # Enable the FPU
  li t0, 0x2000
  csrs mstatus, t0
  csrw	fcsr,x0

  la t0, .fdata0
  fld ft0, (t0)

  fsqrt.s ft2, ft0

infinite_loop:
  j infinite_loop

.section ".fdata0","ax",@progbits
  .8byte 0xc5d584907716dd07

Got ft2=0x5b7c271f which corresponds to 7.09747e+16, instead of ft2=0x7fc00000=nan.
Tested with Ariane with cvfpu 0.7.0 (109f9e9ed3adff25464db3aa021cb88119b7bf53).

Thanks!
Flavien

Bug report: Floating conversion from double to unsigned int may provide wrong result

Hi there!

I've detected a bug in CVFPU.

Brief bug description

A conversion from double to unsigned int fcvt.wu.d provides wrong results in some cases.
I discovered the bug through cva6.

Example instance 1

Here is an example RISC-V (rv64imfd) snippet:

  .section ".text.init","ax",@progbits
  .globl _start
  .align 2
_start:

  # Enable the FPU
  li t0, 0x2000
  csrs mstatus, t0
  csrw	fcsr,x0

  la t0, .fdata0
  fld ft0, (t0)

  fcvt.wu.d a0, ft0, rup

  li t0, 0x10
  sd a0, (t0)

  sw x0, 0(x0)

infinite_loop:
  j infinite_loop

.section ".fdata0","ax",@progbits
  .8byte 0x41dfffffffc00001 # 2.14748e+09

Expected and actual results

We expect a0=0xffffffff80000000. I verified this with Spike.
However, CVA6 gives a0=0x80000000.

Example instance 2

Here is an example RISC-V (rv64imfd) snippet:

  .section ".text.init","ax",@progbits
  .globl _start
  .align 2
_start:

  # Enable the FPU
  li t0, 0x2000
  csrs mstatus, t0
  csrw	fcsr,x0

  la t0, .fdata0
  fld ft0, (t0)

  fcvt.wu.d a0, ft0, rmm

  li t0, 0x10
  sd a0, (t0)

  sw x0, 0(x0)

infinite_loop:
  j infinite_loop

.section ".fdata0","ax",@progbits
  .8byte 0x41efffffffffffff # 4.29497e+09

Expected and actual results

We expect a0=0xffffffffffffffff. I verified this with Spike.
However, CVA6 gives a0=0.

Thanks!
Flavien

Bug report: multifmt slice uses wrong FP width for third operand

Hi,

the fpnew_opgroup_multifmt_slice uses the wrong FP width for the third operand.

AFAIK the third operand is only used by the ADDMUL opgroup, which expects that operands_i[0] and operands_i[1] have the FP format specified by src_fmt_i and operands_i[2] and the result have the FP format dst_fmt_i:

input fpnew_pkg::fp_format_e src_fmt_i, // format of the multiplicands
input fpnew_pkg::fp_format_e dst_fmt_i, // format of the addend and result

Accordingly, the fpnew_opgroup_multifmt_slice should use the width of FP format dst_fmt_i when assigning the individual elements of operands_i[2] to the lane instances. However, the width of src_fmt_i is used for all operands:

for (int unsigned i = 0; i < NUM_OPERANDS; i++) begin
local_operands[i] = operands_i[i] >> LANE*fpnew_pkg::fp_width(src_fmt_i);
end

As a result, the third operand is incorrect for all lanes (except for the first lane), as shown in the following waveform. The first group of signals shows the src_fmt_i (which is FP16) and dst_fmt_i (which is FP32), as well as operands_i[0] (for reference) and operands_i[2] of the fpnew_opgroup_multifmt_slice. The elements of operands_i[0] which are 16 bits wide are correctly assigned to the individual fpnew_fma_multi instances (as seen in the signal groups below, which show the input operands of the first three instances). However, the elements of operands_i[2] which are 32 bits wide are not correctly assigned. The individual elements of these operands are surrounded by rectangles of different colors.

fpnew_bug_annotated

Bug report: Under some microarchitectural circumstances, NAN conversion from simple to double precision gives wrong result

Hi there!

I've detected a bug in CVA6, probably in CVFPU but I'm not certain. I initially posted the issue in the CVA6 repo before moving it here. I used the commit 17743bc7120f1eb24974e5d7eb7f519ef53c4bdc of CVA6.

Brief bug description

A conversion of nan from simple to double may set a lot of bits in the mantissa, unexpectedly, under specific microarchitectural conditions.

Example instance 1

In this instance, we convert the single-precision 0xffffffffff800000 using fcvt.d.s.
We expect to get 0xfff0000000000000 as a result, but under the circumstances induced in the very simple ELF, we get 0xffefffffffffffff. I confirmed the expected result with spike, and in the snippet further below in CVA6.

Here is the ELF and waveforms of the bug.
A symptom of the bug is the first write to address 0x18, which essentially writes the result of the conversion.

Here you can observe that the value dumped is erroneous.

waveform

Example instance 2

In this instance, we convert the single-precision 0xffffffff7f800000 using fcvt.d.s.
We expect to get 0x7ff0000000000000 as a result, but under the circumstances induced in the very simple ELF, we get 0x7fefffffffffffff. I confirmed the expected result with spike, and in the snippet further below in CVA6.

Here is the ELF and waveforms of the bug.
A symptom of the bug is the sixth write to address 0x18, which essentially writes the result of the conversion.

waveform

Example instance that runs correctly

Interestingly, performing the conversion out of context works, for example this snippet executes apparently correctly. It is similar if we replace the values with the values of the second instance.

  .section ".text.init","ax",@progbits
  .globl _start
  .align 2
_start:

  # Enable the FPU
  li t0, 0x2000
  csrs mstatus, t0
  csrw	fcsr,x0

  la t0, .fdata0
  fld fa5, (t0)

  fcvt.d.s ft0, fa5

  li t0, 0x18
  fsd ft0, (t0)

  sw x0, 0(x0)

infinite_loop:
  j infinite_loop

.section ".fdata0","ax",@progbits
  .8byte 0xffffffffff800000

Therefore I deduce that this is conditioned by some microarchitectural effects (maybe branch prediction but I am not certain).

Thanks!
Flavien

Conversion from FP32 -2^31 to INT32 sets NV status

In v0.7.0, converting the FP32 value -2147483648.0 to INT32 produces the result 0x80000000 and sets the NV status flag. However, the input value can be exactly represented as an INT32.

Test case:

  operands_i = '{0: 32'hcf000000, 1: 32'h00000000, 2: 32'h00000000};
  rnd_mode_i = RNE;
  op_i = F2I;
  op_mod_i = 1'b0;
  src_fmt_i = FP32;
  int_fmt_i = INT32;

Root cause:
These lines do not take account of the fact that for the valid input value, the exponent is exactly 31. If the sign is negative and the mantissa is 0x80000000 in this case, there should not be overflow.

Operands alignment error

I had run 4 instructions and all of them are from different operation blocks (ADDMUL, DIVSQRT, NONCOMP, and CONV). I know that the ADDMUL block uses three operands as input operands_i[2:0][31:0] and this input is set in little-endian format.
The problem I am getting is with simple FADD.S instruction, it should add operand_i[1][31:0] and operand_i[0][31:0] but it is adding operand_i[2][31:0] with operand_i[1][31:0].
Can someone elaborate this please, do I have to give rs1 and rs2 at locations [2] and [1], or is there any other way?

Always NaN output on Vivado

I'm simulating the FPU inside Vivado 2020.1. The testbench is this:

module tb_top;
  localparam fpnew_pkg::fpu_features_t Features = fpnew_pkg::RV32F;
  localparam fpnew_pkg::fpu_implementation_t Implementation = fpnew_pkg::DEFAULT_NOREGS;
  localparam int unsigned WIDTH = Features.Width;
  localparam int unsigned NUM_OPERANDS = 3;
  localparam type TagType = logic;

  /* input */logic                                                  clk_i = 0;
  /* input */logic                                                  rst_ni = 0;

  /* input */logic                    [NUM_OPERANDS-1:0][WIDTH-1:0] operands_i = 0;
  /* input */fpnew_pkg::roundmode_e                                rnd_mode_i;
  /* input */fpnew_pkg::operation_e                                op_i;
  /* input */logic                                                  op_mod_i;
  /* input */fpnew_pkg::fp_format_e                                src_fmt_i;
  /* input */fpnew_pkg::fp_format_e                                dst_fmt_i;
  /* input */fpnew_pkg::int_format_e                               int_fmt_i;
  /* input */logic                                                  vectorial_op_i = 0;
  /* input */TagType                                                tag_i;

  /* input */logic                                                  in_valid_i = 0;
  /* output */logic                                                  in_ready_o;
  /* input  */logic                                                  flush_i = 0;
  /* output */logic                    [       WIDTH-1:0]            result_o;

  /* output */fpnew_pkg::status_t                                   status_o;
  /* output */TagType                                    tag_o;
  /* output */logic                                                  out_valid_o;

  /* input */logic                                                  out_ready_i = 1;

  /* output */logic                                                  busy_o;


  fpnew_top DUT_i (.*);

  /* Clock Generation */
  always #5 clk_i = ~clk_i;

  initial begin
    rnd_mode_i = fpnew_pkg::RNE;
    op_mod_i = 0;
    op_i = fpnew_pkg::ADD;
    src_fmt_i = fpnew_pkg::FP32;
    dst_fmt_i = fpnew_pkg::FP32;
    int_fmt_i = fpnew_pkg::INT32;
    tag_i = 0;

    operands_i[0] = 'h3f800000;
    operands_i[1] = 'h3f800000;
    operands_i[2] = 'h3f800000;
    flush_i = 1;

    for (integer i = 0; i < 10; i++) begin
      @(posedge clk_i);
    end

    /* Deassert reset */
    rst_ni  = 1;
    flush_i = 0;
    @(posedge clk_i);
    @(posedge clk_i);
    @(posedge clk_i);
    @(posedge clk_i);
    in_valid_i = 1;
    @(posedge clk_i);
    @(posedge clk_i);
    in_valid_i = 0;
    for (integer i = 0; i < 10; i++) begin
      @(posedge clk_i);
    end
    $finish;
  end

endmodule

The result of the module is this one:

Screenshot 2021-12-23 at 16 36 56

Which is a NaN. Can someone please walk me through the issue? Is this a Vivado thing?

Underflow flag issue

hello,
We were testing Floating point unit and we encounter one issue that during the operation of fadd and fsub instruction the underflow flag does not get high whereas in fmul,fmadd,fmsub,fnmadd and fnmsub the underflow flag gets high whenever the result is subnormal.

Can you explain what is the logic behind it.

Regards:
Hamza Shabbir

Facing fatal error while running floating point example

Hi,

I am running floating point example in /home/pbhatt82/core-v-verif/core-v-cores/cv32e40p/example_tb/core this directory, I am getting below error.

Path : /home/pbhatt82/core-v-verif/core-v-cores/cv32e40p/example_tb/core
Command : make calculator-fp-vsim-run-gui

image

Multi-cycle Path

Hi,

Is there any multi-cycle path in the design? Critical paths for some instructions are very long.

Thanks.

Bug report: Floating conversion from double to simple infinity sets a lot of ones in the mantissa

Hi there!

I've detected a bug in CVFPU.

Brief bug description

A conversion of +inf from double to simple sets a lot of bits in the mantissa, unexpectedly.
I have found it through cva6.

Example instance

Here is an example RISC-V (rv64imfd) snippet:

  .section ".text.init","ax",@progbits
  .globl _start
  .align 2
_start:

  # Enable the FPU
  li t0, 0x2000
  csrs mstatus, t0
  csrw	fcsr,x0

  la t0, .fdata0
  fld ft0, (t0)

  fcvt.s.d ft1, ft0, rdn

  li t0, 0x18
  fsd ft1, (t0)

  sw x0, 0(x0)

infinite_loop:
  j infinite_loop

.section ".fdata0","ax",@progbits
  .8byte 0x7ff0000000000000

Expected and actual results

We expect ft1= 0xffffffff7f800000.. I verified this with Spike.
However, CVA6 gives ft1= 0xffffffff7f7fffff..

Thanks!
Flavien

FDIV and FSQRT busy

I am using this fpnew with these parameter, other parameters are default.

      .Features       ( fpnew_pkg::RV32F          ),
      .Implementation ( fpnew_pkg::DEFAULT_NOREGS )

I am facing a problem while handling out_valid_o and busy_o signals. I am making this combination assign fpu_busy_idu = fp_busy & (~out_valid_fpu2c); to stall the processor while the fpu is in working mode.

Problem;
When we have the FSQRT.S or FDIV.S instruction, the fpnew remain it busy busy signal high for more cycles although we get the correct result on first valid signal.
image

busy_o is fp_busy and out_valid_o is out_valid_fpu2c.

SIGSEGV on Vivado

Hi pulp-team,
I am experiencing segmentation faults while using your fpu in Xilinx Vivado.
I created a very simple testbench, which drives the fpu input signals, under Vivado 2018.2 and instantiated the top module 'fpnew_top' in it.
After that, I properly configured the module in order to implement 32-bit fp operations.
The compilation hangs during the elaboration phase.
I attach my elaborate.log for your convenience.
elaborate.log

Kind regards,
Giovanni

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.