Git Product home page Git Product logo

cv32e40p's Introduction

Build Status

OpenHW Group CORE-V CV32E40P RISC-V IP

CV32E40P is a small and efficient, 32-bit, in-order RISC-V core with a 4-stage pipeline that implements the RV32IM[F|Zfinx]C instruction set architecture, and the PULP custom extensions for achieving higher code density, performance, and energy efficiency [1], [2]. It started its life as a fork of the OR10N CPU core that is based on the OpenRISC ISA. Then, under the name of RI5CY, it became a RISC-V core (2016), and it has been maintained by the PULP platform team until February 2020, when it has been contributed to OpenHW Group.

Documentation

The CV32E40P user manual can be found in the docs folder and it is captured in reStructuredText, rendered to html using Sphinx. These documents are viewable using readthedocs and can be viewed here.

Verification

The verification environment for the CV32E40P is not in this Repository. There is a small, simple testbench here which is useful for experimentation only and should not be used to validate any changes to the RTL prior to pushing to the master branch of this repo.

The verification environment for this core as well as other cores in the OpenHW Group CORE-V family is at the core-v-verif repository on GitHub.

The Makefiles supported in the core-v-verif project automatically clone the appropriate version of the cv32e40p RTL sources.

Changelog

A changelog is generated automatically in the documentation from the individual pull requests. In order to enable automatic changelog generation within the CV32E40P documentation, the committer is required to label each pull request that touches any file in 'rtl' (or any of its subdirectories) with Component:RTL and label each pull request that touches any file in 'docs' (or any of its subdirectories) with Component:Doc. Pull requests that are not labeled or labeled with ignore-for-release are ignored for the changelog generation.

Only the person who actually performs the merge can add these labels (you need committer rights). The changelog flow only works if at most 1 label is applied and therefore pull requests that touches both RTL and documentation files in the same pull request are not allowed.

Constraints

Example synthesis constraints for the CV32E40P are provided.

Contributing

We highly appreciate community contributions. We are currently using the lowRISC contribution guide. To ease our work of reviewing your contributions, please:

  • Create your own fork to commit your changes and then open a Pull Request to the dev branch.
  • Split large contributions into smaller commits addressing individual changes or bug fixes. Do not mix unrelated changes into the same commit!
  • Do not mix updates within the 'rtl' directory with updates within the 'docs' directory ino the same pull request.
  • Write meaningful commit messages. For more information, please check out the the Ibex contribution guide.
  • If asked to modify your changes, do fixup your commits and rebase your branch to maintain a clean history.
  • If the PR gets accepted and merged into the the dev branch, an action is triggered automatically to check whether the changes are logically equivalent to the frozen RTL on a given set of parameters. If the changes are logically equivalent, the dev branch is automatically merged into the master branch. Otherwise, we need to investigate manually. If a bug is found, thus the changes are not logically equivalent, we follow the procedure documented here.

For more details on how this is implemented, have a look at this page.

When contributing SystemVerilog source code, please try to be consistent and adhere to the lowRISC Verilog coding style guide.

To get started, please check out the "Good First Issue" list.

The RTL code has been formatted with "Verible" v0.0-1149-g7eae750. Run ./util/format-verible to format all the files.

Issues and Troubleshooting

If you find any problems or issues with CV32E40P or the documentation, please check out the issue tracker and create a new issue if your problem is not yet tracked.

References

  1. Gautschi, Michael, et al. "Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices." in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700-2713, Oct. 2017

  2. Schiavone, Pasquale Davide, et al. "Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications." 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS 2017)

cv32e40p's People

Contributors

andreaskurth avatar antmas avatar atraber avatar bluewww avatar davideschiavone avatar dawidzim avatar fabianschuiki avatar florent-gwt avatar francescoconti avatar gautschimi avatar gmarkall avatar haugoug avatar jeremybennett avatar jm4rtin avatar lucabertaccini avatar mikeopenhwgroup avatar mp-17 avatar owenchj0 avatar pascalgouedo avatar razer6 avatar samuelriedel avatar silabs-arjanb avatar silabs-oysteink avatar silabs-paulz avatar stmach avatar strichmo avatar suppamax avatar svenstucki avatar wallento avatar yoannpruvost avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cv32e40p's Issues

The least-significant bit of JALR target address should be set to zero

The The RISC-V Instruction Set Manual: Volume I Version 2.0 Page 15, 2.5 Control Transfer Instructions writes that:

The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target
address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the
least-significant bit of the result to zero.

I found id_stage.sv line 433:

`JT_JALR: jump_target = regfile_data_ra_id + imm_i_type;

It does not set the least-significant bit of the result to zero.

Maybe you does it but I have found that ?

can't run 'make all'

According to the README file, I run "make all" in docs/datasheet/
but, these are some error as follow:
make all


***** Printing Tgif figure:


***** ./figures_raw/events.eps
make: tgif: No such file or directory
make: *** [figures/events.pdf] Error 1
What should I do?

Nested Exception Support

Hello

I'm not sure if I am interpreting the manual for this core correctly, I hope you can help.

Section 12.3 talks about exception handling, specifically it says:

RI5CY does support nested interrupt/exception handling. Exceptions inside interrupt/exception
handlers cause another exception, thus exceptions during the critical part of your exception handlers, i.e.
before having saved the MEPC and MESTATUS registers, will cause those register to be overwritten.

(emphasis mine)

To my understanding, to support nested exception handling (the RISC-V priv ISA spec calls it nested trap support: 3.1.7 of v1.10), one must include the xPIE and xPP fields of the MSTATUS CSR in order to properly recover the privilege mode and interrupt enabled states. It does not detail a hardware mechanism for saving MSTATUS and MEPC.

There is nothing to indicate hardware support for nested saving of the MEPC or the MSTATUS registers in the RI5CY user manual. This in my opinion is what explicit support for nested exception handling would be. Otherwise, as the manual says, the MEPC and MSTATUS registers are clobbered immediately by the first exception you get inside your handler, and unless you have already saved them via software, you can no longer get back to the code which caused the first exception.

So my question is this: does RI5CY support hardware assisted nested exception handling in machine mode, or does it only support nested exception handling the way any other core might, by requiring the MEPC and MSTATUS to be saved in time for any other exception which might occur?

Thanks, and let me know if anything doesn't make sense!

Ben

if rs1=x0 both CSRRS and CSRRC will not write to the CSR at all

In RISCV privileged specification v1.7 2.1 Instructions to access CSRs:

For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all,
and so shall not cause any of the side effects that might otherwise occur on a CSR write. Note
that if rs1 specifies a register holding a zero value other than x0, the instruction will still write the
unmodified value back to the CSR.

I found csr_we_int in cs_registers.sv, it just depend on csr_op_i.

Execution hang with hardware loops and timer interrupts

An execution hang is observed with a cross product of external timer interrupts and hardware loops. The core trace is given below. The last instruction of the hardware loop (pc = 0x349a) is lbu x24, 23(x14). When the execution resumes from the interrupt handler at that pc, the opcode executed is different [lbu x24, 7(x0)], resulting in data corruption. The corrupt data possibly propagates downstream and causes a hang in execution. Note that the handler is being entered at either 0x2074 or 0x207C depending on the type of interrupt i.e. timer A compare or timer B compare interrupt.

873960000    21832 0000346e 0060007b lp.starti     0x0, 0x6           
874000000    21833 00003472 0140107b lp.endi       0x0, 0x14          
874040000    21834 00003476 0070307b lp.counti     x0, 0x7            
874080000    21835 0000347a 007543b3 xor           x7, x10, x7          x7=181e1b87 x10:18196ba7  x7:00077020
874120000    21836 0000347e fe7a0503 lb            x10, -25(x20)       x10=00000077 x20:00030a70  PA:00030a57
874160000    21837 00003482 00cda9a3 sw            x12, 19(x27)        x12:0000238c x27:00059e30  PA:00059e43
874920000    21856 00002074 2640006f jal           x0, 612            
874960000    21857 000022d8 34202ef3 csrrs         x29, x0, 0x342      x29=8000001d
875000000    21858 000022dc 30002ef3 csrrs         x29, x0, 0x300      x29=00001880
875160000    21862 000022e0 34102ef3 csrrs         x29, x0, 0x341      x29=00003486
875200000    21863 000022e4 1a104eb7 lui           x29, 0x1a104000     x29=1a104000
875240000    21864 000022e8 004eaf03 lw            x30, 4(x29)         x30=20000000 x29:1a104000  PA:1a104004
875280000    21865 000022ec 20000fb7 lui           x31, 0x20000000     x31=20000000
875480000    21870 000022f0 01fea623 sw            x31, 12(x29)        x31:20000000 x29:1a104000  PA:1a10400c
875520000    21871 000022f4 30200073 mret                             
875920000    21881 00003486 01868cb3 add           x25, x13, x24       x25=00000000 x13:00000000 x24:00000000
875960000    21882 0000348a 007585b3 add           x11, x11, x7        x11=3037872e x11:18196ba7  x7:181e1b87
876000000    21883 0000348e 00b5f5b3 and           x11, x11, x11       x11=3037872e x11:3037872e x11:3037872e
876040000    21884 00003492 ff8d0523 sb            x24, -22(x26)       x24:00000000 x26:0006a0f0  PA:0006a0da
876080000    21885 00003496 027563b3 rem           x7, x10, x7          x7=00000077 x10:00000077  x7:181e1b87
876320000    21891 0000349a 01774c03 lbu           x24, 23(x14)        x24=00000000 x14:00065bd0  PA:00065be7
876360000    21892 0000347a 007543b3 xor           x7, x10, x7          x7=00000000 x10:00000077  x7:00000077
876560000    21897 0000347e fe7a0503 lb            x10, -25(x20)       x10=00000077 x20:00030a70  PA:00030a57
876600000    21898 00003482 00cda9a3 sw            x12, 19(x27)        x12:0000238c x27:00059e30  PA:00059e43
877120000    21911 00003486 01868cb3 add           x25, x13, x24       x25=00000000 x13:00000000 x24:00000000
877360000    21917 0000348a 007585b3 add           x11, x11, x7        x11=3037872e x11:3037872e  x7:00000000
877400000    21918 0000348e 00b5f5b3 and           x11, x11, x11       x11=3037872e x11:3037872e x11:3037872e
877440000    21919 00003492 ff8d0523 sb            x24, -22(x26)       x24:00000000 x26:0006a0f0  PA:0006a0da
877480000    21920 00003496 027563b3 rem           x7, x10, x7          x7=00000077 x10:00000077  x7:00000000
878880000    21955 0000349a 01774c03 lbu           x24, 23(x14)        x24=00000000 x14:00065bd0  PA:00065be7
878920000    21956 0000347a 007543b3 xor           x7, x10, x7          x7=00000000 x10:00000077  x7:00000077
879120000    21961 0000347e fe7a0503 lb            x10, -25(x20)       x10=00000077 x20:00030a70  PA:00030a57
879160000    21962 00003482 00cda9a3 sw            x12, 19(x27)        x12:0000238c x27:00059e30  PA:00059e43
879680000    21975 00003486 01868cb3 add           x25, x13, x24       x25=00000000 x13:00000000 x24:00000000
879920000    21981 0000348a 007585b3 add           x11, x11, x7        x11=3037872e x11:3037872e  x7:00000000
879960000    21982 0000348e 00b5f5b3 and           x11, x11, x11       x11=3037872e x11:3037872e x11:3037872e
880000000    21983 00003492 ff8d0523 sb            x24, -22(x26)       x24:00000000 x26:0006a0f0  PA:0006a0da
880040000    21984 00003496 027563b3 rem           x7, x10, x7          x7=00000077 x10:00000077  x7:00000000
881480000    22020 00002074 2640006f jal           x0, 612            
881560000    22022 000022d8 34202ef3 csrrs         x29, x0, 0x342      x29=8000001d
881600000    22023 000022dc 30002ef3 csrrs         x29, x0, 0x300      x29=00001880
881760000    22027 000022e0 34102ef3 csrrs         x29, x0, 0x341      x29=0000349a
881800000    22028 000022e4 1a104eb7 lui           x29, 0x1a104000     x29=1a104000
881840000    22029 000022e8 004eaf03 lw            x30, 4(x29)         x30=20000000 x29:1a104000  PA:1a104004
881880000    22030 000022ec 20000fb7 lui           x31, 0x20000000     x31=20000000
882080000    22035 000022f0 01fea623 sw            x31, 12(x29)        x31:20000000 x29:1a104000  PA:1a10400c
882120000    22036 000022f4 30200073 mret                             
882520000    22046 0000349a 00704c03 lbu           x24, 7(x0)          x24=000000xx  PA:00000007
882560000    22047 0000347a 007543b3 xor           x7, x10, x7          x7=00000000 x10:00000077  x7:00000077
882760000    22052 0000347e fe7a0503 lb            x10, -25(x20)       x10=00000077 x20:00030a70  PA:00030a57
882800000    22053 00003482 00cda9a3 sw            x12, 19(x27)        x12:0000238c x27:00059e30  PA:00059e43
883320000    22066 00003486 01868cb3 add           x25, x13, x24       x25=xxxxxxxx x13:00000000 x24:000000xx
883560000    22072 0000348a 007585b3 add           x11, x11, x7        x11=3037872e x11:3037872e  x7:00000000
883600000    22073 0000348e 00b5f5b3 and           x11, x11, x11       x11=3037872e x11:3037872e x11:3037872e
883640000    22074 00003492 ff8d0523 sb            x24, -22(x26)       x24:000000xx x26:0006a0f0  PA:0006a0da
883680000    22075 00003496 027563b3 rem           x7, x10, x7          x7=00000077 x10:00000077  x7:00000000
885080000    22110 0000349a 01774c03 lbu           x24, 23(x14)        x24=00000000 x14:00065bd0  PA:00065be7
885120000    22111 0000347a 007543b3 xor           x7, x10, x7          x7=00000000 x10:00000077  x7:00000077
885320000    22116 0000347e fe7a0503 lb            x10, -25(x20)       x10=00000077 x20:00030a70  PA:00030a57
885360000    22117 00003482 00cda9a3 sw            x12, 19(x27)        x12:0000238c x27:00059e30  PA:00059e43
885880000    22130 00003486 01868cb3 add           x25, x13, x24       x25=00000000 x13:00000000 x24:00000000
886120000    22136 0000348a 007585b3 add           x11, x11, x7        x11=3037872e x11:3037872e  x7:00000000
886160000    22137 0000348e 00b5f5b3 and           x11, x11, x11       x11=3037872e x11:3037872e x11:3037872e
886200000    22138 00003492 ff8d0523 sb            x24, -22(x26)       x24:00000000 x26:0006a0f0  PA:0006a0da
886240000    22139 00003496 027563b3 rem           x7, x10, x7          x7=00000077 x10:00000077  x7:00000000
887640000    22174 0000349a 01774c03 lbu           x24, 23(x14)        x24=00000000 x14:00065bd0  PA:00065be7
887680000    22175 0000347a 007543b3 xor           x7, x10, x7          x7=00000000 x10:00000077  x7:00000077
887880000    22180 0000347e fe7a0503 lb            x10, -25(x20)       x10=00000077 x20:00030a70  PA:00030a57
887920000    22181 00003482 00cda9a3 sw            x12, 19(x27)        x12:0000238c x27:00059e30  PA:00059e43
888680000    22200 00002074 2640006f jal           x0, 612            
888720000    22201 000022d8 34202ef3 csrrs         x29, x0, 0x342      x29=8000001d
888760000    22202 000022dc 30002ef3 csrrs         x29, x0, 0x300      x29=00001880
888920000    22206 000022e0 34102ef3 csrrs         x29, x0, 0x341      x29=00003486
888960000    22207 000022e4 1a104eb7 lui           x29, 0x1a104000     x29=1a104000
889000000    22208 000022e8 004eaf03 lw            x30, 4(x29)         x30=20000000 x29:1a104000  PA:1a104004
889040000    22209 000022ec 20000fb7 lui           x31, 0x20000000     x31=20000000
889240000    22214 000022f0 01fea623 sw            x31, 12(x29)        x31:20000000 x29:1a104000  PA:1a10400c
889280000    22215 000022f4 30200073 mret                             
889680000    22225 00003486 01868cb3 add           x25, x13, x24       x25=00000000 x13:00000000 x24:00000000
889720000    22226 0000348a 007585b3 add           x11, x11, x7        x11=3037872e x11:3037872e  x7:00000000
889760000    22227 0000348e 00b5f5b3 and           x11, x11, x11       x11=3037872e x11:3037872e x11:3037872e
889800000    22228 00003492 ff8d0523 sb            x24, -22(x26)       x24:00000000 x26:0006a0f0  PA:0006a0da
889840000    22229 00003496 027563b3 rem           x7, x10, x7          x7=00000077 x10:00000077  x7:00000000
891280000    22265 0000207c 29c0006f jal           x0, 668            
891360000    22267 00002318 34202ef3 csrrs         x29, x0, 0x342      x29=8000001f
891400000    22268 0000231c 30002ef3 csrrs         x29, x0, 0x300      x29=00001880
891560000    22272 00002320 34102ef3 csrrs         x29, x0, 0x341      x29=0000349a
891600000    22273 00002324 1a104eb7 lui           x29, 0x1a104000     x29=1a104000
891640000    22274 00002328 004eaf03 lw            x30, 4(x29)         x30=80000000 x29:1a104000  PA:1a104004
891680000    22275 0000232c 80000fb7 lui           x31, 0x80000000     x31=80000000
891880000    22280 00002330 01fea623 sw            x31, 12(x29)        x31:80000000 x29:1a104000  PA:1a10400c
891920000    22281 00002334 30200073 mret                             
892320000    22291 0000349a 00704c03 lbu           x24, 7(x0)          x24=000000xx  PA:00000007
892360000    22292 0000347a 007543b3 xor           x7, x10, x7          x7=00000000 x10:00000077  x7:00000077
892560000    22297 0000347e fe7a0503 lb            x10, -25(x20)       x10=00000077 x20:00030a70  PA:00030a57
892600000    22298 00003482 00cda9a3 sw            x12, 19(x27)        x12:0000238c x27:00059e30  PA:00059e43
893120000    22311 00003486 01868cb3 add           x25, x13, x24       x25=xxxxxxxx x13:00000000 x24:000000xx
8933 [The core trace abruptly stops here]

verilator failing - %Warning-PINMISSING: top.sv:74: Cell has missing pin: fregfile_disable_i

Hi
I am running Ubuntu verilator version
Verilator 3.856 2014-03-11 rev verilator_3_855-19-g749ff02

this was installed from
$ apt-get install verilator
(currently re-testing with git stable release of verilator)

When trying to run make in the verilator-model directory I get the following failure

%Warning-PINMISSING: top.sv:74: Cell has missing pin: fregfile_disable_i
%Warning-PINMISSING: Use "/* verilator lint_off PINMISSING */" and lint_on around source to disable this message.
%Error: ../rtl/riscv_hwloop_regs.sv:73: Unsupported: Assignment pattern applies against non struct/union: PACKARRAYDTYPE
%Error: ../rtl/riscv_hwloop_regs.sv:89: Unsupported: Assignment pattern applies against non struct/union: PACKARRAYDTYPE
%Error: ../rtl/riscv_hwloop_regs.sv:110: Unsupported: Assignment pattern applies against non struct/union: PACKARRAYDTYPE
%Error: Exiting due to 3 error(s)
%Error: Command Failed /usr/bin/verilator_bin -O3 -CFLAGS '-O3 -g3 -std=gnu++11' -Wno-CASEINCOMPLETE -Wno-LITENDIAN -Wno-UNOPT -Wno-UNOPTFLAT -Wno-WIDTH -Wno-fatal --top-module top --Mdir obj_dir --trace -DPULP_FPGA_EMUL -cc +incdir+../rtl/include cluster_clock_gating.sv dp_ram.sv ram.sv top.sv ../rtl/include/apu_core_package.sv ../rtl/include/riscv_defines.sv ../rtl/include/riscv_tracer_defines.sv ../rtl/riscv_alu.sv ../rtl/riscv_alu_basic.sv ../rtl/riscv_alu_div.sv ../rtl/riscv_compressed_decoder.sv ../rtl/riscv_controller.sv ../rtl/riscv_cs_registers.sv ../rtl/riscv_debug_unit.sv ../rtl/riscv_decoder.sv ../rtl/riscv_int_controller.sv ../rtl/riscv_ex_stage.sv ../rtl/riscv_hwloop_controller.sv ../rtl/riscv_hwloop_regs.sv ../rtl/riscv_id_stage.sv ../rtl/riscv_if_stage.sv ../rtl/riscv_load_store_unit.sv ../rtl/riscv_mult.sv ../rtl/riscv_prefetch_buffer.sv ../rtl/riscv_prefetch_L0_buffer.sv ../rtl/riscv_register_file.sv ../rtl/riscv_core.sv ../rtl/riscv_apu_disp.sv ../rtl/riscv_L0_buffer.sv testbench.cpp --exe
make: *** [obj_dir/Vtop.mk] Error 10

Thx
Lee

Unused APU signal tying off logic in ID stage causes issues.

The the for loop here can execute over each signal. The generate loop should be labeled and a begin/end statements inserted as well.

  for (genvar i=0;i<APU_NARGS_CPU;i++)
      assign apu_operands[i]         = '0;
      assign apu_waddr               = '0;
      assign apu_flags               = '0;
      assign apu_write_regs_o        = '0;
      assign apu_read_regs_o         = '0;
      assign apu_write_regs_valid_o  = '0;
      assign apu_read_regs_valid_o   = '0;

Proposed revision:

 genvar i;
    for (i=0;i<APU_NARGS_CPU;i++) begin : apu_tie_off
     assign apu_operands[i]         = 32'h00;
    end
     assign apu_waddr               = {APU_NDSFLAGS_CPU{1'b0}};
     assign apu_flags               = 6'b0;
     assign apu_write_regs_o        = 6'b0;
     assign apu_read_regs_o         = 6'b0;
     assign apu_write_regs_valid_o  = 1'b0;
     assign apu_read_regs_valid_o   = 1'b0;

Bug for fadd.s ???

Hi,

From the standard IEEE 754-2008:
infi
addition(x, ∞) or addition( ∞, x) for finite x, we should signal no exception, but the RTL set the overflow flag and inexact flag. it violates the standard.

Thanks
Dream

FPGA Synthesis Target Frequency

What is the target/achievable frequency for running the single core on FPGA?
I see from http://www.pulp-platform.org/documentation/ 654 MHz on 65nm ASIC, but no info for FPGA implementation.
I can achieve 40 MHz running alongside a ZynqPS, two Xilinx Block RAMs and the AXI interconnection between the two. I am wondering if you have other numbers to compare with.

Thank you.

Core reset sequence and simulation

I am testing the RI5CY core alone (all there is in this repo, nothing from pulpino) and have problems while simulating it with my testbench. By observing waveforms many signals appear not be initialized on reset, resulting in 'X' and the controller's FSM loops in the FIRST_FETCH state.
Is there a specific reset sequence to be sent or other special initializations on the core's inputs?
My reset sequence is the following in an initial statement:
`
clk_i = 1'b0;

rst_ni = 1'b0;      // assert reset

clock_en_i = 1'b0;

test_en_i = 1'b0;

fetch_enable_i = 1'b0;

#500ns;

clock_en_i = 1'b1;  

#1500ns;

rst_ni = 1'b1;  // release reset

#1500ns;

fetch_enable_i = 1'b1;

`

issue in writing logic of utvec_n

Hi,

There's an issue in the writing logic of utvec_n in csr register. The width of utvec_n is 24. But the write data is 32-bit ---> {csr_wdata_int[31:8],8'h0}.

///----------RTL------------------------///
logic [23:0] utvec_n, utvec_q;

 // utvec: user trap-handler base address
  12'h005: if (csr_we_int) begin
    utvec_n    = {csr_wdata_int[31:8],8'h0};

CPI too high for BLT and MUL?

I have done some RTL simulations on the RISCV in combination with the Pulpino platform. To evaluate the Cycles Per Instruction, I counted the difference in cycle count between the current and next instructions.

Like this:

         Time                Cycles  PC       Instr    Mnemonic
         4840000             104     00000080 00c0006f jal              x0, 12             
         4920000             106     0000008c 30501073 csrrw            x0, x0, 0x305      
         4960000             107     00000090 00000093 addi             x1, x0, 0            x1=00000000

jal takes 106-104 = 2 cycles
csrrw takes 107-106 = 1 cycle

and so on.

Now I came across a few remarkable observations:

  • the blt instruction can take up to 63 cycles (<3 cycles on average though)!
  • the mul instruction can also take up to 31 cycles (~1 cycle on average though)!

Is there any motivation why this can happen?

blt instruction trace snippet:

         Time                 Cycles PC       Instr    Mnemonic
         18177360000          454417 00001ddc 02a64533 div              x10, x12, x10       x10=00000b6a x12:00680000 x10:0000091c
         18178280000          454440 00001de0 00ab8bb3 add              x23, x23, x10       x23=00000b6a x23:00000000 x10:00000b6a
         **18178320000          454441 00001de4 f28b42e3 blt              x22, x8, -220       x22:00000008  x8:00000100**
         18180840000          454504 00001d08 00098693 addi             x13, x19, 0         x13=00100f58 x19:00100f58
         18180880000          454505 00001d0c 00000713 addi             x14, x0, 0          x14=00000000
         18180920000          454506 00001d10 00000c93 addi             x25, x0, 0          x25=00000000

mul instruction trace snippet:

         Time                 Cycles PC       Instr    Mnemonic
         18171080000          454260 00001db0 00f50533 add              x10, x10, x15       x10=02980000 x10:02340000 x15:00640000
         18171120000          454261 00001db4 03454533 div              x10, x10, x20       x10=00530000 x10:02340000 x20:00000008
         **18172360000          454292 00001db8 02e70733 mul              x14, x14, x14       x14=00000000 x14:00000000 x14:00000000**
         18173600000          454323 00001dbc 41275733 sra              x14, x14, x18       x14=00000000 x14:00000000 x18:00000004
         18173640000          454324 00001dc0 40e50533 sub              x10, x10, x14       x10=00530000 x10:00530000 x14:00000000
         18173680000          454325 00001dc4 0b5000ef jal              x1, 2228             x1=00001dc8

Branch prediction and flushing

Hey,
I was searching for a branch prediction unit but there seems to be none or let's say a static prediction which assumes every branch is not taken. Am I right?
When a conditional branch is taken then there is no explicit 'flush' signal set instead signals indicating a valid/not valid instruction is set?

Request to add support for 16-bit R/W accesses to debug port

It would be nice for those of us who only have 16-bit R/W access from the host while halted to be able to single step the processor. This means adding a two-bit input, word_lane_en[1:0], or 4-bit input, byte_lane_en[3:0], which tells the debug unit which bits are being accessed. I don't require 8-bit R/W access, but it might be nice to provide it as well if others may need it.

Documentation issue with lp.setupi instruction

There is a documentation bug with lp.setupi instruction. The document describes the instruction mnemonic as given below.
lp.setupi L, uimmS, uimmL

lpstart[L] = pc + 4
lpend[L] = pc + (uimmS << 1)
lpcount[L] = uimmL
The correct mnemonic is - lp.setupi L, uimmL, uimmS

Streaming operator not supported in various EDA tools.

The streaming operator used in riscv_alu_div is not supported by various EDA tools and some emulators.
assign ResReg_DP_rev = {<<{ResReg_DP}};
Suggest using for loop until steaming operator support has propagated throughout EDA tool industry.

   always_comb begin
    for (int i = 0; i < C_WIDTH; i++) begin
      ResReg_DP_rev[i] = ResReg_DP[C_WIDTH - 1 - i];
    end
  end

Instruction wrongly executed after core branch

Hi,
Here is the assembly code that can reproduce the issue.
5a2: 30472703 lw a4,772(a4) // load data from address 0x4002_c304
5a6: f81630e3 p.bneimm a2,1,526 //a2==1, should jump to 526
5aa: 8321 srli a4,a4,0x8 //

Core load data from address 0x4002_c304 when pc=5a2. The data is not got immediately and be delayed several cycles. When core jump to 0x526 by executing jump instruction in 5a6, instruction in 5aa should not be executed. But actually it's executed. This issue happens when instruction in 5aa uses the
same register(a4) in 5a2. id_ready is deasserted because of load_stall and instr_valid is not cleared.

issue snap

Thanks,
Tim

Unexpected mepc csr read return value from csrrwi instruction

A directed test comprising of a csrrwi instruction is failing with a csr read return value mismatch.
The instructions at cycle number 20870, 20871 and 20872 comprises of the test. The second csrrwi is an observation point for the first csrrwi instruction. It should read a value of 0x8 whereas it returns 0x15 as the result. Not only does it not return the value written in the previous csrrwi instruction, the bit 0 is also set to 1 which is not allowed in mepc register in implementations with support for compressed extension.
834920000 20856 00003186 fe478683 lb x13, -28(x15) x13=ffffffbf x15:00058870 PA:00058854
835200000 20863 0000318a 004362b3 or x5, x6, x4 x5=0000001a x6:00000000 x4:0000001a
835400000 20868 0000318e 00b69693 slli x13, x13, 0xb x13=fffdf800 x13:ffffffbf
835440000 20869 00003190 00448493 addi x9, x9, 4 x9=0004fb10 x9:0004fb0c
835480000 20870 00003194 34145273 csrrwi x4, 0x00000008, 0x341 x4=000030f2
835520000 20871 00003198 fe190383 lb x7, -31(x18) x7=00000071 x18:000930e0 PA:000930c1
835560000 20872 0000319c 341ad2f3 csrrwi x5, 0x00000015, 0x341 x5=00000015
835760000 20877 000031a0 0054a023 sw x5, 0(x9) x5:00000015 x9:0004fb10 PA:0004fb10
835800000 20878 000031a4 fff95303 lhu x6, -1(x18) x6=00004400 x18:000930e0 PA:000930df
836320000 20891 000031a8 fe5d22a3 sw x5, -27(x26) x5:00000015 x26:0008b890 PA:0008b875
837120000 20911 0000207c 29c0006f jal x0, 668
837200000 20913 00002318 34202ef3 csrrs x29, x0, 0x342 x29=8000001f
837240000 20914 0000231c 30002ef3 csrrs x29, x0, 0x300 x29=00001880
837400000 20918 00002320 34102ef3 csrrs x29, x0, 0x341 x29=000031ac
837440000 20919 00002324 1a104eb7 lui x29, 0x1a104000 x29=1a104000

Large program crashes simulator - probable comb. loop within processor?

This program:

void test(void);

int main(void) {
    test();

    return 0;
}

#include "256Ki_nops" // 262144 32-bit nops

void test(void) {
}

Crashes the simulator due to the consecutive execution of these two instructions;

   103b8:       00100097                auipc   ra,0x100
   103bc:       020080e7                jalr    32(ra) # 1103d8 <test>

which call the test function.

The compiler generates this sequence of instructions in order to perform a long branch (~> 1MiB).

Illegal instruction exception with fence and fence.i instructions

It is observed that execution of fence and fence.i instructions result into illegal instruction exception. It is an unexpected behavior as PULPino RI5CY core claims full support for RV32I base integer instruction set. Implementing the use case of self/cross modifying code and relaxed memory model between hardware threads will not be possible without these instructions.
Filing an issue to track resolution that is TBD.

The core jumped into the wrong address.

When I change the riscv core from commit 6c928ad to f0180d6, something goes wrong.

This is the assembly code.
image

The incorrect waveform:
image
The correct waveform:
image

I trace the signal in riscv_controller.sv
if ((jump_in_dec_i == BRANCH_JALR) && (((regfile_we_wb_i == 1'b1) && (reg_d_wb_is_reg_a_i == 1'b1)) || ((regfile_we_ex_i == 1'b1) && (reg_d_ex_is_reg_a_i == 1'b1)) || ((regfile_alu_we_fw_i == 1'b1) && (reg_d_alu_is_reg_a_i == 1'b1))) ) begin jr_stall_o = 1'b1; deassert_we_o = 1'b1; end
the pipeline does not stall when wait for the branch address.

Packed vector ranges declared with negative parameters

When some parameters are declared as 0 signals are declared with negative indexes:

riscv_core.sv:105       Vector range '[WAPUTYPE - 1:0]' ([-1:0]) of 'apu_master_type_o' is ascending, should be descending
riscv_core.sv:137       Vector range '[N_EXT_PERF_COUNTERS - 1:0]' ([-1:0]) of 'ext_perf_counters_i' is ascending, should be descending
riscv_core.sv:219       Vector range '[WAPUTYPE - 1:0]' ([-1:0]) of 'apu_type_ex' is ascending, should be descending
riscv_cs_registers.sv:133       Vector range '[N_EXT_CNT - 1:0]' ([-1:0]) of 'ext_counters_i' is ascending, should be descending
riscv_decoder.sv:116    Vector range '[WAPUTYPE - 1:0]' ([-1:0]) of 'apu_type_o' is ascending, should be descending
riscv_decoder.sv:119    Vector range '[WAPUTYPE - 1:0]' ([-1:0]) of 'apu_flags_src_o' is ascending, should be descending
riscv_id_stage.sv:159   Vector range '[WAPUTYPE - 1:0]' ([-1:0]) of 'apu_type_ex_o' is ascending, should be descending
riscv_id_stage.sv:361   Vector range '[WAPUTYPE - 1:0]' ([-1:0]) of 'apu_type' is ascending, should be descending
riscv_id_stage.sv:373   Vector range '[WAPUTYPE - 1:0]' ([-1:0]) of 'apu_flags_src' is ascending, should be descending
riscv_prefetch_buffer.sv:73     Vector range '[0:DEPTH - 1]' ([0:3]) of 'addr_n' is ascending, should be descending
riscv_prefetch_buffer.sv:74     Vector range '[0:DEPTH - 1]' ([0:3]) of 'rdata_n' is ascending, should be descending
riscv_prefetch_buffer.sv:75     Vector range '[0:DEPTH - 1]' ([0:3]) of 'valid_n' is ascending, should be descending
riscv_prefetch_buffer.sv:76     Vector range '[0:1]' of 'is_hwlp_n' is ascending, should be descending

This causes issues in various synthesis tools.

Documentation???

Hello,

is there a place where the parameters to the core are and the apu_master interface are explained?

Also, how do I configure the implementation? For example, I would like to implement it without a HW FPU.

Many Thanks!

rudi

LSU assertion fails

There appears to be a problem with the 2nd assertion property in riscv_load_store_unit. I believe I have a valid case where it fails. It is failing in my simulation where the grant (data_gnt_i) occurs in the same cycle as the request (data_req_o) during a write transaction. The data_rvalid_i is asserted in the same cycle as is required for a write. Since there are no intervening cycles, the current state, CS, is still IDLE when this happens. Thus, the assertion fails.

This was a write transaction, but if grant and rvalid are allowed to be asserted in the first request cycle in a read transaction, we should see the same bogus assertion failure.

RISCY hangs when one single cycle mul was followed by ecall

Hi,

Here is a simple assembly case like the below:
_ sc1g9p829 w3 cvmz381

the RISCY will hang and can't finish, I noticed that the ctrl_fsm_cs of riscv_controller stoped at FLUSH_EX state, and can't enter FLUSH_WB state. since the ex_valid_i would not go high and the pipeline was empty, Also, the ecall would stall to fetch instr, so it is deadlock.

waveform

Thanks
Dream

Potential for call stack corruption

It looks like the RTL doesn't describe a machine scratch register among the CSRs which have been implemented. This is intended to be used to point to a trap-handler context storage container, such that on entry to the trap-handler its contents is swapped with that of the register holding the current stack-pointer (single instruction to do the swap).

Without this, the trap-handler may have to use the stack of the thread which was interrupted, in order to save its context. In the absence of any atomic stack operations, this is fine provided that the stack pointer is moved to reflect the increase in size of the stack before the new stack space is used, i.e. on a push the stack pointer is decremented before the store is made. If this is not done, and an interrupt occurs during a push operation after the store and before the SP decrement, the handler could end up pushing the context of the interrupted thread on top of the value pushed by the interrupted thread.

The problem is that the RISC-V standard C calling convention doesn't appear to constrain the order in which the store and decrement happen (as expected), and I looked at some assembly produced by the C compiler earlier and it does not appear to exhibit the behaviour required to avoid this.

FIX: Implement the machine scratch register.

Responding to an instruction fetch in the same cycle leaves the core deadlocked

I attached an instruction cache to the core that under certain circumstances can provide a responses to an instruction request in the same cycle (instr_req, instr_gnt, and instr_r_valid all in the same cycle). In this case the core ignores the response and deadlocks.

See the attached image: The core requests instruction word 0x1c000090, which is granted and responded to in the same cycle. However the core is now stuck, probably ignoring the response.
riscy_rvalid_ignored

build verilator error

Hi,
when I run "make" in the verilator-model directory to build the verilator model, I get the follow error message:

g++ -Wall -Werror -std=c++11 -Iobj_dir pkg-config --cflags verilator -c -o testbench.o testbench.cpp
testbench.cpp: In function ‘int main(int, char**)’:
testbench.cpp:226:13: error: ‘new’ of type ‘Vtop’ with extended alignment 128 [-Werror=aligned-new=]
cpu = new Vtop;
^~~~
testbench.cpp:226:13: note: uses ‘void* operator new(std::size_t)’, which does not have an alignment parameter
testbench.cpp:226:13: note: use ‘-faligned-new’ to enable C++17 over-aligned new support
cc1plus: all warnings being treated as errors
<内置>: recipe for target 'testbench.o' failed
make: *** [testbench.o] Error 1

Does anyone know what's wrong with it?
What's more, I don't understand the sentence "that version of verilator must be known to pkg-config". Is this related to the above error?

THX,
Allen

Bug for fle/flt/feq ???

Hi,

From the riscv_spec_v2.2 Page 52:
fcmp

I created a directed case that set the operand to NAN like the below, but i didn't see that the NV flag was set in RTL.
ass

Thanks
Dream

Instruction port RGV protocol issue

Bug condition

The RGV protocol as explained in the user manual is not respected in the following case:

  • Outside memory not responding immediately (= grant signal not asserted in the same cycle as request)
  • Jump instruction received

Waveforms are attached

  • wave_bug_jump_1/2:
    Show two examples of the abnormal behavior at two different times.
    In those waves we see the address changes from the prefetch addr (0x0084) to the jump addr
    (0x008C) (see the red cursor) without waiting for the grant. This is inconsistent with the RGV protocol explained in the user manual which indicates the address is updated only after the grant response.
    Consequently, if the outside memory take into account the address sent at the beginning of the request rather than the one available during the grant response (which is theoretically authorized by the protocol), the returned data will correspond to the unwanted prefetch addr, and the valid instruction targeted by the jump will be lost.
    wave_bug_jump_1
    wave_bug_jump_2

  • wave_branch_ok:
    Shows there is no issue when the address jump corresponds to a conditional branch instruction.
    wave_branch_ok

Investigation

In this case, the address switch is controlled by the pc_set_i signal of "controller.sv".
In "controller.sv" we can see a condition which have certainly been added to wait for the end of the current request before updating the address.
(refer to attached file "controller_code_1")
controller_code_1
But as the jr_stall_o signal is not set to 1'b1 during the "JAL" jump, the pc_set_i signal is immediately asserted.
If we analyze the code where jr_stall_o is generated, we can see it is dependent to a "JALR" jump type but not to our failing "JAL" jump.
(refer to attached file "controller_code_2")
controller_code_2
Do not know if it is the source of the issue.

A problem in "prefetch_L0_buffer.sv"

In the file "prefetch_L0_buffer.sv", line 667

if (branch_i)
begin
instr_req_o = 1'b1;

      if (instr_gnt_i)
        NS = WAIT_RVALID;
      else
        NS = WAIT_GNT;
    end
    else
    begin
      instr_req_o    = 1'b1;

      if (instr_gnt_i)
        NS = WAIT_RVALID;
      else
        NS = WAIT_GNT;
    end

No matter what the state "branch_i" is, the following codes are the same, is there any mistake?

Debug halt causes illegal instructions

Summary

We are working on a verilator model of RI5CY and are attempting to understand how to use the debug unit. Normal execution seems to work correctly as far as we can observe. Whenever we attempt to halt the processor by writing DBG_CTRL_HALT to DBG_CTRL, an illegal instruction exception seems to be generated.

Details

Our test bench does the following (the test bench file is at https://github.com/embecosm/ri5cy/blob/debug_halt/verilator-model/testbench.cpp), which is inspired by looking at tb.svh and spi_debug_test.svh in the PULPino repository:

  1. Sets irq_i, debug_req_i, fetch_enable_i and rstn_i to 0.
  2. Runs for 100 cycles (for the reset, probably this is more than needed)
  3. Set rstn_i to 1 and fetch_enable_i to 1.
  4. Runs for 20 clock cycles, to allow the CPU to do some "normal" execution. This appears to work without problems.
  5. Writes to the debug unit DBG_CTRL = DBG_CTRL | DBG_CTRL_HALT
  6. Writes DBG_IE = 0xF
  7. Waits for debug stall by reading DBG_CTRL until we observe that DBG_CTRL_HALT is set.
  8. Sets single step by writing DBG_CTRL = DBG_CTRL_HALT | DBG_CTRL_SSTE.

At this point we think we should be ready to single-step. For each single step, we:

  1. Clear DBG_HIT (I didn't see this in the test bench, but it's what the debug_bridge GDB server appeared to do for single steps).
  2. Writes DBG_CTRL = DBG_CTRL_SSTE.
  3. Waits for debug stall (as in step 7 above).

The first one or two single steps seem to work OK (depending on how many cycles we run for and how much we try and interact with the debug unit) but it seems that after about 8 cycles, an illegal instruction exception occurs.

Observations

A colleague of mine has suggested that it looks like something goes awry in the prefetch fifo - when executing normally, core/if_stage/prefetch_32/prefetch/fifo valid_Q[0] is never set, so we bypass the input of the fifo to the output without ever caching it in the fifo itself. In the debug case the core halts and the fifo fills up. The fifo then wipes one of the entries to 0, later it passes that to the decoder and it gives the undefined exception.

Reproducing

If it is helpful to try to reproduce the issue, then this can be done by:

  1. Cloning our RI5CY repo: https://github.com/embecosm/ri5cy
  2. Checking out the debug_halt branch: https://github.com/embecosm/ri5cy/tree/debug_halt
  3. In the verilator-model subdirectory, run make. Note that this requires the latest version of verilator installed (3.906), and for pkg-config to be able to find verilator.
  4. Execute ./testbench.

The output that it produces is reproduced here:

$ ./testbench 
DBG_CTRL  10001
DBG_HIT   0
DBG_CAUSE 1f
DBG_NPC   cc
DBG_PPC   c8
DBG_CTRL  10001
DBG_HIT   1
DBG_CAUSE 0
DBG_NPC   d0
DBG_PPC   cc
                1430: Entering exception routine.
                1430: Illegal instruction (core 0) at PC 0x000000d0:
DBG_CTRL  10001
DBG_HIT   1
DBG_CAUSE 0
DBG_NPC   84
DBG_PPC   d0
DBG_CTRL  10001
DBG_HIT   1
DBG_CAUSE 2
DBG_NPC   88
DBG_PPC   84
                1630: Entering exception routine.
                1630: Illegal instruction (core 0) at PC 0x00000088:
DBG_CTRL  10001
DBG_HIT   1
DBG_CAUSE 2
DBG_NPC   84
DBG_PPC   88

The output is produced by the stepSingle function - we read the registers each time we try to single step, to try and get some visibility into what is happening.

Questions

Looking at past issues it seems that other people are successfully making use of the debug unit, so I guess there might be something wrong with what we'e doing or how we're setting things up.

  • Are we doing something wrong in the setup or halting of the CPU?
  • If not, is there anything else we can look at to try and understand what is going on?
  • Any other thoughts?

Any assistance is greatly appreciated - many thanks in advance!

A problem in riscv_decoder.sv

In "riscv_decoder.sv", there are codes like:

 // special p.elw (event load)
        if (instr_rdata_i[14:12] == 3'b110)
          data_load_event_o = 1'b1;

When instr[6:0] = OPCODE_LOAD or instr[6:0] = OPCODE_LOAD_POST, the above code will be executed. But I find that only the instruction "LWU" in RV64I could lead to

instr_rdata_i[14:12] = 3'b110

In "RV32IMFCXpulp", instr[14:12] could never be 3'b110.
Is this a little bug? Can you please explain the meaning of data_load_event_o?

Thank you!

Supported Instructions

I am wondering if the {FENCE, FENCE.I} instructions will be implemented in future releases or if you plan to create new modules to handle them. I see no implementation details in this repo or the pulpino one. The only mentions are in the riscv_tracer.
Thank you.

Hardware loop counter does not decrement if loop end boundary spans across an uncompressed instruction

In case the hardware loop is setup in such a way that the loop end boundary splits across a word-length [non-compressed] instruction, the loop is not executed more than once. Also the loop counter is not decremented at the end of the loop, as read from the CSR.
Chances of it happening are less, but we should document this behavior.
Loop start ----+ +----- Loop end
| |
---------------| add | add |---------
|<- 4byte ->|<- 4byte ->|
| | |
Another interesting observation is that, when the loop start boundary is split across a word length instruction, the subsequent re-executions of the loop decode and start the execution from the second half of the first instruction.

Core go to sleep when illegal_insn_i

Hi,
I have question about the sleep control in the new RI5CY. When I set sleep enable in event_unit, and fetch an illegal instruction, the core will go to sleep mode.
In riscv_controller.sv:
//--------------RTL begin ---------------------//
mret_insn_i | uret_insn_i | ecall_insn_i | pipe_flush_i | ebrk_insn_i | illegal_insn_i: begin
halt_if_o = 1'b1;
halt_id_o = 1'b1;
ctrl_fsm_ns = FLUSH_EX;
end
csr_status_i: begin
halt_if_o = 1'b1;
ctrl_fsm_ns = id_ready_i ? FLUSH_EX : DECODE;
end
//---------------RTL end------------------------//

The core will go to FLUSH state in these conditions(illegal_insn_i, ecall_insn_i ...), then go to sleep if sleep enable is set. I'm not sure if it's a expected behavior. In last RI5CY version, only wfi and eret instructions can make the core go to sleep.

Bug for fclass instr ????

Hi,

The definition of NAN from IEEE 754-2008 standard:
The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaNs are represented by a bit pattern with an exponent of all ones and a non-zero mantissa. The sign bit can be 0 or 1 – it has no bearing.

There are two categories of NaNs:
QNaN (Quiet NaN) – arising when the result of an arithmetic operation is mathematically undefined. The MSB of the mantissa is '1' for this type of NaN.
SNaN (Signaling NaN) – used to signal an exception when an invalid operation is performed. The MSB of the mantissa is '0' for this type of NaN.

But the implement of fpu_core may exist some problem in riscv_alu.sv:
fclass

it seems that the RTL violates the definition of NAN.

Thanks
Dream

[RISCY CORE] Problem with executing ecall instruction when there's a halt request pending

I am working on testing the single step feature of RISCY core.
Here's the test sequence:

  • Instructions are being fetched and executed in the background
  • Set DBG_IE[ECALL] = 1'b1, expect PC jumps to trap address(0x88) when ecall is executed.
  • Write 'h10000 to DBG_CTRL from debug interface to enter the debug mode
  • There's a delay between DBG_CTRL is programmed on debug interface and debug_halted_o is actually asserted. During this window, ecall is executed, NPC is changed to PC(ecall) + 4 instead of the trap address.
  • Program 'h1 to DBG_CTRL to enable single step mode, design executed instructions from PC(ecall) + 4.

Please check if it's a design issue.
Thanks
Tao

how do I use __builtin_pulp_dotsp2() in C?

I've been experimenting with these built-in C functions from Davide Schiavone's presentation, but I am having trouble with shortVec:

typedef short shortVec attribute ((vector_size (2))));

void runTests(void) {
shortVec v1 = {5, 6};
...
}

warning: excess elements in vector initializer
shortVec v1 = {5, 6};
^

In debug, PPC reads the next instruction after 'ebreak'

The RI5CY core user manual says if debug mode is entered from an "ebreak" instruction, the PPC should read the address of the "ebreak" instruction. But it is actually reading the address of the next instruction. The NPC reads the next instruction address too, which agrees with the manual.

Port RTL to Verilog

Hi. I am wondering if you would accept a contribution that ported the (synthesizable) RTL to Verilog, (from System Verilog). The main use case (for me anyway) would be usability with yosys, and in particular its verification tools.

Address is changed in ongoing request on instruction bus

This is related to this pull request: lowRISC/ibex#7
However the issues shows for both the zeroriscy and the ri5cy core.

When a branch instruction is encountered (and the branch is taken) the prefetch unit immediately puts the new address on the instruction bus when the prefetch unit is currently in the WAIT_GNT state.

Relevant sources:
ri5scy: https://github.com/pulp-platform/riscv/blob/master/rtl/riscv_prefetch_buffer.sv#L264
zeroriscy: https://github.com/pulp-platform/zero-riscy/blob/master/zeroriscy_prefetch_buffer.sv#L149

We see this issue causing problems in our implementation only when both the branch_i and the instr_gnt_i signal are asserted in the same cycle. However, also if the address is changed earlier (when the instr_req_o signal already has been asserted but before instr_gnt_i is received) the question remains if the protocol specification (from the user's manuals for the zeroriscy and the ri5cy core) allows this behavior.

timing2

The timing below shows this behavior when using the zeroriscy core. The address can change as late as the instr_gnt_i signal is received or at an earlier time in the WAIT_GNT state of the prefetch unit (2nd diagram).
untitled

untitled2

[RISCY CORE] ALU and debug register write request conflict in debug mode

In this test, I am verifying debug read and write after the core has entered debug halted state(debug_halted_o is asserted). I noticed that ALU is still running when debug_halted_o is asserted, which is kind of unexpected. In the failed simulation, both ALU and debug interface are trying to write some general purpose registers right after entering debug mode. Debug interface is trying to write x27 with data 'ha5f6_11d1, ALU is trying to write x14 with data 'ha. As you can see from the register file value, the write to x27 is successful, but ALU write is failed and x14 stays at previous value 'h5.
As indicated by the trace file, x14 should be updated to 0'ha. This instruction is the last instruction before entering debug mode:

36054000 36044 8000259e 03095733 divu x14, x18, x16 x14=0000000a x18:0000000a x16:00000001

Since the ALU write and debug write are targeting at different registers, both of them should succeed.
I have seen another failure syndrome when debug write happens before the ALU write and both of them are targeting the same register. The debug write value will be overwritten by the later ALU write, which kind of violates the intention of debug write. Can you check if above behavior is correct? I feel the ALU should be idle when in debug mode.

This issue can be reproduced by running instructions that takes multiple cycles to compute, enter debug mode before computation is done, and try to write some irrelevant general purpose registers from debug interface.

Core performance counter is not accounting all cycles

When using the CYCLES event, not all cycles are accounted while all cycles should be accounted as soon as the core is clocked.
It seems to be only with special instructions like csr instructions.
The issue can be reproduced with this test: tests/pulp_tests/bugs/perf_cycle_lost
This happens on all chips even vega.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.