Git Product home page Git Product logo

picorv32's Introduction


PicoRV32 - A Size-Optimized RISC-V CPU

PicoRV32 is a CPU core that implements the RISC-V RV32IMC Instruction Set. It can be configured as RV32E, RV32I, RV32IC, RV32IM, or RV32IMC core, and optionally contains a built-in interrupt controller.

Tools (gcc, binutils, etc..) can be obtained via the RISC-V Website. The examples bundled with PicoRV32 expect various RV32 toolchains to be installed in /opt/riscv32i[m][c]. See the build instructions below for details. Many Linux distributions now include the tools for RISC-V (for example Ubuntu 20.04 has gcc-riscv64-unknown-elf). To compile using those set TOOLCHAIN_PREFIX accordingly (eg. make TOOLCHAIN_PREFIX=riscv64-unknown-elf-).

PicoRV32 is free and open hardware licensed under the ISC license (a license that is similar in terms to the MIT license or the 2-clause BSD license).

Table of Contents

Features and Typical Applications

  • Small (750-2000 LUTs in 7-Series Xilinx Architecture)
  • High fmax (250-450 MHz on 7-Series Xilinx FPGAs)
  • Selectable native memory interface or AXI4-Lite master
  • Optional IRQ support (using a simple custom ISA)
  • Optional Co-Processor Interface

This CPU is meant to be used as auxiliary processor in FPGA designs and ASICs. Due to its high fmax it can be integrated in most existing designs without crossing clock domains. When operated on a lower frequency, it will have a lot of timing slack and thus can be added to a design without compromising timing closure.

For even smaller size it is possible disable support for registers x16..x31 as well as RDCYCLE[H], RDTIME[H], and RDINSTRET[H] instructions, turning the processor into an RV32E core.

Furthermore it is possible to choose between a dual-port and a single-port register file implementation. The former provides better performance while the latter results in a smaller core.

Note: In architectures that implement the register file in dedicated memory resources, such as many FPGAs, disabling the 16 upper registers and/or disabling the dual-port register file may not further reduce the core size.

The core exists in three variations: picorv32, picorv32_axi and picorv32_wb. The first provides a simple native memory interface, that is easy to use in simple environments. picorv32_axi provides an AXI-4 Lite Master interface that can easily be integrated with existing systems that are already using the AXI standard. picorv32_wb provides a Wishbone master interface.

A separate core picorv32_axi_adapter is provided to bridge between the native memory interface and AXI4. This core can be used to create custom cores that include one or more PicoRV32 cores together with local RAM, ROM, and memory-mapped peripherals, communicating with each other using the native interface, and communicating with the outside world via AXI4.

The optional IRQ feature can be used to react to events from the outside, implement fault handlers, or catch instructions from a larger ISA and emulate them in software.

The optional Pico Co-Processor Interface (PCPI) can be used to implement non-branching instructions in an external coprocessor. Implementations of PCPI cores that implement the M Standard Extension instructions MUL[H[SU|U]] and DIV[U]/REM[U] are included in this package.

Files in this Repository

You are reading it right now.


This Verilog file contains the following Verilog modules:

Module Description
picorv32 The PicoRV32 CPU
picorv32_axi The version of the CPU with AXI4-Lite interface
picorv32_axi_adapter Adapter from PicoRV32 Memory Interface to AXI4-Lite
picorv32_wb The version of the CPU with Wishbone Master interface
picorv32_pcpi_mul A PCPI core that implements the MUL[H[SU|U]] instructions
picorv32_pcpi_fast_mul A version of picorv32_pcpi_fast_mul using a single cycle multiplier
picorv32_pcpi_div A PCPI core that implements the DIV[U]/REM[U] instructions

Simply copy this file into your project.

Makefile and testbenches

A basic test environment. Run make test to run the standard test bench (testbench.v) in the standard configurations. There are other test benches and configurations. See the test_* make target in the Makefile for details.

Run make test_ez to run testbench_ez.v, a very simple test bench that does not require an external firmware .hex file. This can be useful in environments where the RISC-V compiler toolchain is not available.

Note: The test bench is using Icarus Verilog. However, Icarus Verilog 0.9.7 (the latest release at the time of writing) has a few bugs that prevent the test bench from running. Upgrade to the latest github master of Icarus Verilog to run the test bench.


A simple test firmware. This runs the basic tests from tests/, some C code, tests IRQ handling and the multiply PCPI core.

All the code in firmware/ is in the public domain. Simply copy whatever you can use.


Simple instruction-level tests from riscv-tests.


Another simple test firmware that runs the Dhrystone benchmark.


A simple example SoC using PicoRV32 that can execute code directly from a memory mapped SPI flash.


Various scripts and examples for different (synthesis) tools and hardware architectures.

Verilog Module Parameters

The following Verilog module parameters can be used to configure the PicoRV32 core.

ENABLE_COUNTERS (default = 1)

This parameter enables support for the RDCYCLE[H], RDTIME[H], and RDINSTRET[H] instructions. This instructions will cause a hardware trap (like any other unsupported instruction) if ENABLE_COUNTERS is set to zero.

Note: Strictly speaking the RDCYCLE[H], RDTIME[H], and RDINSTRET[H] instructions are not optional for an RV32I core. But chances are they are not going to be missed after the application code has been debugged and profiled. This instructions are optional for an RV32E core.

ENABLE_COUNTERS64 (default = 1)

This parameter enables support for the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions. If this parameter is set to 0, and ENABLE_COUNTERS is set to 1, then only the RDCYCLE, RDTIME, and RDINSTRET instructions are available.

ENABLE_REGS_16_31 (default = 1)

This parameter enables support for registers the x16..x31. The RV32E ISA excludes this registers. However, the RV32E ISA spec requires a hardware trap for when code tries to access this registers. This is not implemented in PicoRV32.


The register file can be implemented with two or one read ports. A dual ported register file improves performance a bit, but can also increase the size of the core.

LATCHED_MEM_RDATA (default = 0)

Set this to 1 if the mem_rdata is kept stable by the external circuit after a transaction. In the default configuration the PicoRV32 core only expects the mem_rdata input to be valid in the cycle with mem_valid && mem_ready and latches the value internally.

This parameter is only available for the picorv32 core. In the picorv32_axi and picorv32_wb core this is implicitly set to 0.

TWO_STAGE_SHIFT (default = 1)

By default shift operations are performed in two stages: first shifts in units of 4 bits and then shifts in units of 1 bit. This speeds up shift operations, but adds additional hardware. Set this parameter to 0 to disable the two-stage shift to further reduce the size of the core.

BARREL_SHIFTER (default = 0)

By default shift operations are performed by successively shifting by a small amount (see TWO_STAGE_SHIFT above). With this option set, a barrel shifter is used instead.

TWO_CYCLE_COMPARE (default = 0)

This relaxes the longest data path a bit by adding an additional FF stage at the cost of adding an additional clock cycle delay to the conditional branch instructions.

Note: Enabling this parameter will be most effective when retiming (aka "register balancing") is enabled in the synthesis flow.

TWO_CYCLE_ALU (default = 0)

This adds an additional FF stage in the ALU data path, improving timing at the cost of an additional clock cycle for all instructions that use the ALU.

Note: Enabling this parameter will be most effective when retiming (aka "register balancing") is enabled in the synthesis flow.

COMPRESSED_ISA (default = 0)

This enables support for the RISC-V Compressed Instruction Set.

CATCH_MISALIGN (default = 1)

Set this to 0 to disable the circuitry for catching misaligned memory accesses.

CATCH_ILLINSN (default = 1)

Set this to 0 to disable the circuitry for catching illegal instructions.

The core will still trap on EBREAK instructions with this option set to 0. With IRQs enabled, an EBREAK normally triggers an IRQ 1. With this option set to 0, an EBREAK will trap the processor without triggering an interrupt.

ENABLE_PCPI (default = 0)

Set this to 1 to enable the external Pico Co-Processor Interface (PCPI). The external interface is not required for the internal PCPI cores, such as picorv32_pcpi_mul.

ENABLE_MUL (default = 0)

This parameter internally enables PCPI and instantiates the picorv32_pcpi_mul core that implements the MUL[H[SU|U]] instructions. The external PCPI interface only becomes functional when ENABLE_PCPI is set as well.

ENABLE_FAST_MUL (default = 0)

This parameter internally enables PCPI and instantiates the picorv32_pcpi_fast_mul core that implements the MUL[H[SU|U]] instructions. The external PCPI interface only becomes functional when ENABLE_PCPI is set as well.

If both ENABLE_MUL and ENABLE_FAST_MUL are set then the ENABLE_MUL setting will be ignored and the fast multiplier core will be instantiated.

ENABLE_DIV (default = 0)

This parameter internally enables PCPI and instantiates the picorv32_pcpi_div core that implements the DIV[U]/REM[U] instructions. The external PCPI interface only becomes functional when ENABLE_PCPI is set as well.

ENABLE_IRQ (default = 0)

Set this to 1 to enable IRQs. (see "Custom Instructions for IRQ Handling" below for a discussion of IRQs)

ENABLE_IRQ_QREGS (default = 1)

Set this to 0 to disable support for the getq and setq instructions. Without the q-registers, the irq return address will be stored in x3 (gp) and the IRQ bitmask in x4 (tp), the global pointer and thread pointer registers according to the RISC-V ABI. Code generated from ordinary C code will not interact with those registers.

Support for q-registers is always disabled when ENABLE_IRQ is set to 0.

ENABLE_IRQ_TIMER (default = 1)

Set this to 0 to disable support for the timer instruction.

Support for the timer is always disabled when ENABLE_IRQ is set to 0.

ENABLE_TRACE (default = 0)

Produce an execution trace using the trace_valid and trace_data output ports. For a demonstration of this feature run make test_vcd to create a trace file and then run python3 testbench.trace firmware/firmware.elf to decode it.

REGS_INIT_ZERO (default = 0)

Set this to 1 to initialize all registers to zero (using a Verilog initial block). This can be useful for simulation or formal verification.

MASKED_IRQ (default = 32'h 0000_0000)

A 1 bit in this bitmask corresponds to a permanently disabled IRQ.

LATCHED_IRQ (default = 32'h ffff_ffff)

A 1 bit in this bitmask indicates that the corresponding IRQ is "latched", i.e. when the IRQ line is high for only one cycle, the interrupt will be marked as pending and stay pending until the interrupt handler is called (aka "pulse interrupts" or "edge-triggered interrupts").

Set a bit in this bitmask to 0 to convert an interrupt line to operate as "level sensitive" interrupt.

PROGADDR_RESET (default = 32'h 0000_0000)

The start address of the program.

PROGADDR_IRQ (default = 32'h 0000_0010)

The start address of the interrupt handler.

STACKADDR (default = 32'h ffff_ffff)

When this parameter has a value different from 0xffffffff, then register x2 (the stack pointer) is initialized to this value on reset. (All other registers remain uninitialized.) Note that the RISC-V calling convention requires the stack pointer to be aligned on 16 bytes boundaries (4 bytes for the RV32I soft float calling convention).

Cycles per Instruction Performance

A short reminder: This core is optimized for size and fmax, not performance.

Unless stated otherwise, the following numbers apply to a PicoRV32 with ENABLE_REGS_DUALPORT active and connected to a memory that can accommodate requests within one clock cycle.

The average Cycles per Instruction (CPI) is approximately 4, depending on the mix of instructions in the code. The CPI numbers for the individual instructions can be found in the table below. The column "CPI (SP)" contains the CPI numbers for a core built without ENABLE_REGS_DUALPORT.

Instruction CPI CPI (SP)
direct jump (jal) 3 3
ALU reg + immediate 3 3
ALU reg + reg 3 4
branch (not taken) 3 4
memory load 5 5
memory store 5 6
branch (taken) 5 6
indirect jump (jalr) 6 6
shift operations 4-14 4-15

When ENABLE_MUL is activated, then a MUL instruction will execute in 40 cycles and a MULH[SU|U] instruction will execute in 72 cycles.

When ENABLE_DIV is activated, then a DIV[U]/REM[U] instruction will execute in 40 cycles.

When BARREL_SHIFTER is activated, a shift operation takes as long as any other ALU operation.

The following dhrystone benchmark results are for a core with enabled ENABLE_FAST_MUL, ENABLE_DIV, and BARREL_SHIFTER options.

Dhrystone benchmark results: 0.516 DMIPS/MHz (908 Dhrystones/Second/MHz)

For the Dhrystone benchmark the average CPI is 4.100.

Without using the look-ahead memory interface (usually required for max clock speed), this results drop to 0.305 DMIPS/MHz and 5.232 CPI.

PicoRV32 Native Memory Interface

The native memory interface of PicoRV32 is a simple valid-ready interface that can run one memory transfer at a time:

output        mem_valid
output        mem_instr
input         mem_ready

output [31:0] mem_addr
output [31:0] mem_wdata
output [ 3:0] mem_wstrb
input  [31:0] mem_rdata

The core initiates a memory transfer by asserting mem_valid. The valid signal stays high until the peer asserts mem_ready. All core outputs are stable over the mem_valid period. If the memory transfer is an instruction fetch, the core asserts mem_instr.

Read Transfer

In a read transfer mem_wstrb has the value 0 and mem_wdata is unused.

The memory reads the address mem_addr and makes the read value available on mem_rdata in the cycle mem_ready is high.

There is no need for an external wait cycle. The memory read can be implemented asynchronously with mem_ready going high in the same cycle as mem_valid, or mem_ready being tied to constant 1.

Write Transfer

In a write transfer mem_wstrb is not 0 and mem_rdata is unused. The memory write the data at mem_wdata to the address mem_addr and acknowledges the transfer by asserting mem_ready.

The 4 bits of mem_wstrb are write enables for the four bytes in the addressed word. Only the 8 values 0000, 1111, 1100, 0011, 1000, 0100, 0010, and 0001 are possible, i.e. no write, write 32 bits, write upper 16 bits, write lower 16, or write a single byte respectively.

There is no need for an external wait cycle. The memory can acknowledge the write immediately with mem_ready going high in the same cycle as mem_valid, or mem_ready being tied to constant 1.

Look-Ahead Interface

The PicoRV32 core also provides a "Look-Ahead Memory Interface" that provides all information about the next memory transfer one clock cycle earlier than the normal interface.

output        mem_la_read
output        mem_la_write
output [31:0] mem_la_addr
output [31:0] mem_la_wdata
output [ 3:0] mem_la_wstrb

In the clock cycle before mem_valid goes high, this interface will output a pulse on mem_la_read or mem_la_write to indicate the start of a read or write transaction in the next clock cycle.

Note: The signals mem_la_read, mem_la_write, and mem_la_addr are driven by combinatorial circuits within the PicoRV32 core. It might be harder to achieve timing closure with the look-ahead interface than with the normal memory interface described above.

Pico Co-Processor Interface (PCPI)

The Pico Co-Processor Interface (PCPI) can be used to implement non-branching instructions in external cores:

output        pcpi_valid
output [31:0] pcpi_insn
output [31:0] pcpi_rs1
output [31:0] pcpi_rs2
input         pcpi_wr
input  [31:0] pcpi_rd
input         pcpi_wait
input         pcpi_ready

When an unsupported instruction is encountered and the PCPI feature is activated (see ENABLE_PCPI above), then pcpi_valid is asserted, the instruction word itself is output on pcpi_insn, the rs1 and rs2 fields are decoded and the values in those registers are output on pcpi_rs1 and pcpi_rs2.

An external PCPI core can then decode the instruction, execute it, and assert pcpi_ready when execution of the instruction is finished. Optionally a result value can be written to pcpi_rd and pcpi_wr asserted. The PicoRV32 core will then decode the rd field of the instruction and write the value from pcpi_rd to the respective register.

When no external PCPI core acknowledges the instruction within 16 clock cycles, then an illegal instruction exception is raised and the respective interrupt handler is called. A PCPI core that needs more than a couple of cycles to execute an instruction, should assert pcpi_wait as soon as the instruction has been decoded successfully and keep it asserted until it asserts pcpi_ready. This will prevent the PicoRV32 core from raising an illegal instruction exception.

Custom Instructions for IRQ Handling

Note: The IRQ handling features in PicoRV32 do not follow the RISC-V Privileged ISA specification. Instead a small set of very simple custom instructions is used to implement IRQ handling with minimal hardware overhead.

The following custom instructions are only supported when IRQs are enabled via the ENABLE_IRQ parameter (see above).

The PicoRV32 core has a built-in interrupt controller with 32 interrupt inputs. An interrupt can be triggered by asserting the corresponding bit in the irq input of the core.

When the interrupt handler is started, the eoi End Of Interrupt (EOI) signals for the handled interrupts go high. The eoi signals go low again when the interrupt handler returns.

The IRQs 0-2 can be triggered internally by the following built-in interrupt sources:

IRQ Interrupt Source
0 Timer Interrupt
1 EBREAK/ECALL or Illegal Instruction
2 BUS Error (Unalign Memory Access)

This interrupts can also be triggered by external sources, such as co-processors connected via PCPI.

The core has 4 additional 32-bit registers q0 .. q3 that are used for IRQ handling. When the IRQ handler is called, the register q0 contains the return address and q1 contains a bitmask of all IRQs to be handled. This means one call to the interrupt handler needs to service more than one IRQ when more than one bit is set in q1.

When support for compressed instructions is enabled, then the LSB of q0 is set when the interrupted instruction is a compressed instruction. This can be used if the IRQ handler wants to decode the interrupted instruction.

Registers q2 and q3 are uninitialized and can be used as temporary storage when saving/restoring register values in the IRQ handler.

All of the following instructions are encoded under the custom0 opcode. The f3 and rs2 fields are ignored in all this instructions.

See firmware/custom_ops.S for GNU assembler macros that implement mnemonics for this instructions.

See firmware/start.S for an example implementation of an interrupt handler assembler wrapper, and firmware/irq.c for the actual interrupt handler.

getq rd, qs

This instruction copies the value from a q-register to a general-purpose register.

0000000 ----- 000XX --- XXXXX 0001011
f7      rs2   qs    f3  rd    opcode


getq x5, q2

setq qd, rs

This instruction copies the value from a general-purpose register to a q-register.

0000001 ----- XXXXX --- 000XX 0001011
f7      rs2   rs    f3  qd    opcode


setq q2, x5


Return from interrupt. This instruction copies the value from q0 to the program counter and re-enables interrupts.

0000010 ----- 00000 --- 00000 0001011
f7      rs2   rs    f3  rd    opcode




The "IRQ Mask" register contains a bitmask of masked (disabled) interrupts. This instruction writes a new value to the irq mask register and reads the old value.

0000011 ----- XXXXX --- XXXXX 0001011
f7      rs2   rs    f3  rd    opcode


maskirq x1, x2

The processor starts with all interrupts disabled.

An illegal instruction or bus error while the illegal instruction or bus error interrupt is disabled will cause the processor to halt.


Pause execution until an interrupt becomes pending. The bitmask of pending IRQs is written to rd.

0000100 ----- 00000 --- XXXXX 0001011
f7      rs2   rs    f3  rd    opcode


waitirq x1


Reset the timer counter to a new value. The counter counts down clock cycles and triggers the timer interrupt when transitioning from 1 to 0. Setting the counter to zero disables the timer. The old value of the counter is written to rd.

0000101 ----- XXXXX --- XXXXX 0001011
f7      rs2   rs    f3  rd    opcode


timer x1, x2

Building a pure RV32I Toolchain

TL;DR: Run the following commands to build the complete toolchain:

make download-tools
make -j$(nproc) build-tools

The default settings in the riscv-tools build scripts will build a compiler, assembler and linker that can target any RISC-V ISA, but the libraries are built for RV32G and RV64G targets. Follow the instructions below to build a complete toolchain (including libraries) that target a pure RV32I CPU.

The following commands will build the RISC-V GNU toolchain and libraries for a pure RV32I target, and install it in /opt/riscv32i:

# Ubuntu packages needed:
sudo apt-get install autoconf automake autotools-dev curl libmpc-dev \
        libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo \
    gperf libtool patchutils bc zlib1g-dev git libexpat1-dev

sudo mkdir /opt/riscv32i
sudo chown $USER /opt/riscv32i

git clone riscv-gnu-toolchain-rv32i
cd riscv-gnu-toolchain-rv32i
git checkout 411d134
git submodule update --init --recursive

mkdir build; cd build
../configure --with-arch=rv32i --prefix=/opt/riscv32i
make -j$(nproc)

The commands will all be named using the prefix riscv32-unknown-elf-, which makes it easy to install them side-by-side with the regular riscv-tools (those are using the name prefix riscv64-unknown-elf- by default).

Alternatively you can simply use one of the following make targets from PicoRV32's Makefile to build a RV32I[M][C] toolchain. You still need to install all prerequisites, as described above. Then run any of the following commands in the PicoRV32 source directory:

Command Install Directory ISA
make -j$(nproc) build-riscv32i-tools /opt/riscv32i/ RV32I
make -j$(nproc) build-riscv32ic-tools /opt/riscv32ic/ RV32IC
make -j$(nproc) build-riscv32im-tools /opt/riscv32im/ RV32IM
make -j$(nproc) build-riscv32imc-tools /opt/riscv32imc/ RV32IMC

Or simply run make -j$(nproc) build-tools to build and install all four tool chains.

By default calling any of those make targets will (re-)download the toolchain sources. Run make download-tools to download the sources to /var/cache/distfiles/ once in advance.

Note: These instructions are for git rev 411d134 (2018-02-14) of riscv-gnu-toolchain.

Linking binaries with newlib for PicoRV32

The tool chains (see last section for install instructions) come with a version of the newlib C standard library.

Use the linker script firmware/riscv.ld for linking binaries against the newlib library. Using this linker script will create a binary that has its entry point at 0x10000. (The default linker script does not have a static entry point, thus a proper ELF loader would be needed that can determine the entry point at runtime while loading the program.)

Newlib comes with a few syscall stubs. You need to provide your own implementation of those syscalls and link your program with this implementation, overwriting the default stubs from newlib. See syscalls.c in scripts/cxxdemo/ for an example of how to do that.

Evaluation: Timing and Utilization on Xilinx 7-Series FPGAs

The following evaluations have been performed with Vivado 2017.3.

Timing on Xilinx 7-Series FPGAs

The picorv32_axi module with enabled TWO_CYCLE_ALU has been placed and routed for Xilinx Artix-7T, Kintex-7T, Virtex-7T, Kintex UltraScale, and Virtex UltraScale devices in all speed grades. A binary search is used to find the shortest clock period for which the design meets timing.

See make table.txt in scripts/vivado/.

Device Device Speedgrade Clock Period (Freq.)
Xilinx Kintex-7T xc7k70t-fbg676-2 -2 2.4 ns (416 MHz)
Xilinx Kintex-7T xc7k70t-fbg676-3 -3 2.2 ns (454 MHz)
Xilinx Virtex-7T xc7v585t-ffg1761-2 -2 2.3 ns (434 MHz)
Xilinx Virtex-7T xc7v585t-ffg1761-3 -3 2.2 ns (454 MHz)
Xilinx Kintex UltraScale xcku035-fbva676-2-e -2 2.0 ns (500 MHz)
Xilinx Kintex UltraScale xcku035-fbva676-3-e -3 1.8 ns (555 MHz)
Xilinx Virtex UltraScale xcvu065-ffvc1517-2-e -2 2.1 ns (476 MHz)
Xilinx Virtex UltraScale xcvu065-ffvc1517-3-e -3 2.0 ns (500 MHz)
Xilinx Kintex UltraScale+ xcku3p-ffva676-2-e -2 1.4 ns (714 MHz)
Xilinx Kintex UltraScale+ xcku3p-ffva676-3-e -3 1.3 ns (769 MHz)
Xilinx Virtex UltraScale+ xcvu3p-ffvc1517-2-e -2 1.5 ns (666 MHz)
Xilinx Virtex UltraScale+ xcvu3p-ffvc1517-3-e -3 1.4 ns (714 MHz)

Utilization on Xilinx 7-Series FPGAs

The following table lists the resource utilization in area-optimized synthesis for the following three cores:

  • PicoRV32 (small): The picorv32 module without counter instructions, without two-stage shifts, with externally latched mem_rdata, and without catching of misaligned memory accesses and illegal instructions.

  • PicoRV32 (regular): The picorv32 module in its default configuration.

  • PicoRV32 (large): The picorv32 module with enabled PCPI, IRQ, MUL, DIV, BARREL_SHIFTER, and COMPRESSED_ISA features.

See make area in scripts/vivado/.

Core Variant Slice LUTs LUTs as Memory Slice Registers
PicoRV32 (small) 761 48 442
PicoRV32 (regular) 917 48 583
PicoRV32 (large) 2019 88 1085

picorv32's People


bobbl avatar clairexen avatar cliffordwolf avatar dehann avatar emilio93 avatar fatsie avatar frantony avatar guztech avatar hutch31 avatar krystaldelusion avatar ldoolitt avatar mattvenn avatar michaelbell avatar mmicko avatar neuschaefer avatar novakov avatar olofk avatar osresearch avatar pcotret avatar quackduck avatar retrhelo avatar rxrbln avatar splinedrive avatar stv0g avatar thoughtpolice avatar tommythorn avatar tomverbeure avatar wallclimber21 avatar willgreen avatar yanghao avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

picorv32's Issues

placement fails on iCeCube2

When i run picorv32 with icecube2, it fails with the error below in placer (windows). I suspect its a multiple assignment issue on decoder_trigger, and decoder_pseudo_trigger, or synplify optimizing that bit away due to the
decoder_pseudo_trigger <= 0;
assignment . I'll try to get a sim going, to debug this more.

Design Statistics after Packing
Number of LUTs : 1545
Number of DFFs : 608
Number of DFFs packed to IO : 0
Number of Carrys : 233
Device Utilization Summary after Packing
Sequential LogicCells
LUT and DFF : 502
LUT, DFF and CARRY : 106
Combinational LogicCells
Only LUT : 872
CARRY Only : 62
LUT with CARRY : 65
LogicCells : 1607/7680
PLBs : 211/960
BRAMs : 8/32
IOs and GBIOs : 9/206
PLLs : 0/2
I2088: Phase 3, elapsed time : 2.3 (sec)
Phase 4
I2712: Tool unable to find location for GB cpu.decoder_trigger_RNIV293
Error during global Buffer placement

spiflash_tb fails

Running make spiflash_tbin picosoc gives me

iverilog -s testbench -o spiflash_tb.vvp spiflash.v spiflash_tb.v
riscv32-unknown-elf-gcc -march=rv32imc -Wl,-Bstatic,-T,,--strip-debug -ffreestanding -nostdlib -o firmware.elf start.s firmware.c
riscv32-unknown-elf-objcopy -O verilog firmware.elf /dev/stdout | sed -e '1 s/@00000000/@00100000/; 2,65537 d;' > firmware.hex
vvp -N spiflash_tb.vvp
VCD info: dumpfile spiflash_tb.vcd opened for output.

Reset` (FFh)
-- SPI SDR ff 00
-- END

Power Up (ABh)
-- SPI SDR ab 00
-- END

Read Data (03h)
-- SPI SDR 03 00
-- SPI SDR 10 03
-- SPI SDR 00 10
-- SPI SDR 00 00
-- SPI SDR 00 93
-- SPI SDR 00 00
ERROR: Got 00 (00000000) but expected 02 (00000010).
-- SPI SDR 00 00
ERROR: Got 00 (00000000) but expected 30 (00110000).
-- SPI SDR 00 00
ERROR: Got 00 (00000000) but expected 01 (00000001).
-- SPI SDR 00 93
ERROR: Got 93 (10010011) but expected 23 (00100011).
-- SPI SDR 00 01
ERROR: Got 01 (00000001) but expected 22 (00100010).
-- SPI SDR 00 00
ERROR: Got 00 (00000000) but expected 50 (01010000).
-- SPI SDR 00 00
-- END

Quad I/O Read (EBh)
-- SPI SDR eb 00
-- QSPI SDR 10 --
-- QSPI SDR 00 --
-- QSPI SDR 00 --
-- QSPI SDR a5 --
-- QSPI SDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 93 (10010011).
-- QSPI SDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 02 (00000010).
-- QSPI SDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 30 (00110000).
-- QSPI SDR -- z9
ERROR: Got z9 (zzzz1001) but expected 01 (00000001).
-- QSPI SDR -- 30
ERROR: Got 30 (00110000) but expected 23 (00100011).
-- QSPI SDR -- 00
ERROR: Got 00 (00000000) but expected 22 (00100010).
-- QSPI SDR -- 00
ERROR: Got 00 (00000000) but expected 50 (01010000).
-- QSPI SDR -- 09
ERROR: Got 09 (00001001) but expected 00 (00000000).
-- END

Continous Quad I/O Read
-- QSPI SDR 10 --
-- QSPI SDR 00 --
-- QSPI SDR 00 --
-- QSPI SDR ff --
-- QSPI SDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 93 (10010011).
-- QSPI SDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 02 (00000010).
-- QSPI SDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 30 (00110000).
-- QSPI SDR -- z9
ERROR: Got z9 (zzzz1001) but expected 01 (00000001).
-- QSPI SDR -- 30
ERROR: Got 30 (00110000) but expected 23 (00100011).
-- QSPI SDR -- 00
ERROR: Got 00 (00000000) but expected 22 (00100010).
-- QSPI SDR -- 00
ERROR: Got 00 (00000000) but expected 50 (01010000).
-- QSPI SDR -- 09
ERROR: Got 09 (00001001) but expected 00 (00000000).
-- END

DDR Quad I/O Read (EDh)
-- SPI SDR ed 00
-- QSPI DDR 10 --
-- QSPI DDR 00 --
-- QSPI DDR 00 --
-- QSPI DDR a5 --
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 93 (10010011).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 02 (00000010).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 30 (00110000).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 01 (00000001).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 23 (00100011).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 22 (00100010).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 50 (01010000).
-- QSPI DDR -- 93
ERROR: Got 93 (10010011) but expected 00 (00000000).
-- END

Continous DDR Quad I/O Read
-- QSPI DDR 10 --
-- QSPI DDR 00 --
-- QSPI DDR 00 --
-- QSPI DDR ff --
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 93 (10010011).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 02 (00000010).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 30 (00110000).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 01 (00000001).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 23 (00100011).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 22 (00100010).
-- QSPI DDR -- zz
ERROR: Got zz (zzzzzzzz) but expected 50 (01010000).
-- QSPI DDR -- 93
ERROR: Got 93 (10010011) but expected 00 (00000000).
-- END

make: *** [Makefile:50: spiflash_tb] Error 1

Tagged release of picorv32

I'm considering updating the picorv32 core for the FuseSoC standard library, but it would be great to have a tagged release that I can use

Opcode after jump executed on error

To do some basic tests with your PicoRV32 I wrote a testbench, connected a rom with a simple
bare metal program in it and and let it run.

I started with a tiny C-program that I compiled with the riscv-gnu-toolchain

#include <stdio.h>
void main(void)
  register int cpureg15 asm ("a5") = 0; 
  //a5 = ABI-name of R15 

  while (1)

It compiled to :

  -- Test 1
00000000 <main>:
   0:   ff010113            addi    sp,sp,-16
   4:   00812623            sw  s0,12(sp)
   8:   01010413            addi    s0,sp,16
   c:   00000793            li  a5,0
  10:   00178793            addi    a5,a5,1
  14:   ffdff06f            j   10 <main+0x10>  


The first thing I noticed was a xxxxxxxx on mem_addr[31:0] at about 150ns.
I NOP-ed the first three words since they were not required and the xxxxxxxx was gone.

  -- Test 2
00000000 <main>:
   0:   00000013            nop
   4:   00000013            nop
   8:   00000013            nop
   c:   00000793            li  a5,0
  10:   00178793            addi    a5,a5,1
  14:   ffdff06f            j   10 <main+0x10>  


I decided to go after this effect later on.

The next thing I saw were the TRAP at 275ns, mem_addr=0x00000018 at 205ns and
mem_rdata =0x00000000 at 215ns.

As the riscv-spec points out on page 6 (Instruction length encoding) an opcode
of 0x00000000 shall lead to a trap. Since I indeed zeroed the unused portion
of the rom, I added a NOP at (the technically speaking unused) address 0x18 :
Now the TRAP was gone but register 15 stuck at zero and was obviously not incremented.
No wonder, since the two instructions that executed were at 0x14 and 0x18 (JUMP and NOP).

  -- Test 3
00000000 <main>:
   0:   00000013            nop
   4:   00000013            nop
   8:   00000013            nop
   c:   00000793            li  a5,0
  10:   00178793            addi    a5,a5,1
  14:   ffdff06f            j   10 <main+0x10>  
  18:   00000013            nop


To ensure that the NOP at 0x18 was not only prefetched but also executed, I swapped the opcodes
of 0x10 and 0x18. Now R15 began incrementing.

  -- Test 4
00000000 <main>:
   0:   00000013            nop
   4:   00000013            nop
   8:   00000013            nop
   c:   00000793            li  a5,0
  10:   00000013            nop
  14:   ffdff06f            j   10 <main+0x10>  
  18:   00178793            addi    a5,a5,1


I swapped the opcodes back and calculated a jump further back, aiming at 0x0C.
In fact 0x10, 0x14 and 0x18 were executed and R15 was incrementing.

  -- Test 5
00000000 <main>:
   0:   00000013            nop
   4:   00000013            nop
   8:   00000013            nop
   c:   00000793            li  a5,0
  10:   00178793            addi    a5,a5,1
  14:   ff9ff06f            j   0c <main+0x0c>  
  18:   00000013            nop


I replaced the NOP at 0x18 with a second increment and in fact R15 got incremented twice per loop.

  -- Test 6
00000000 <main>:
   0:   00000013            nop
   4:   00000013            nop
   8:   00000013            nop
   c:   00000793            li  a5,0
  10:   00178793            addi    a5,a5,1
  14:   ff9ff06f            j   0c <main+0x0c>  
  18:   00178793            addi    a5,a5,1


My guess is, that the offset of the jump instruction is added correctly to the program counter
but only after executing the opcode following the jump instruction by mistake.

My question is : can you reproduce this effect or is something wrong in my setup?
Sadly I'm no verilog guy, so there is no use in trying to debug your code on my own.

Vivado 2016.1, ARTY-board (Artix-7 XC7A35T), riscv32-unknown-elf-gcc (GCC) 5.3.0


Interpreting mem_wstrb


I'm trying to understand the PicoRV32 native memory interface. How should mem_wstrb be interpreted in the != 0 case?

custom0 opcode not defined in riscv-gnu-toolchain

How is the custom0 opcode defined for RV32 toolchain?

I am trying to work with your IRQ examples in the firmware directory.

When I compile with the riscv32im toolchain I am getting unrecognized opcode custom0 errors:

start.S: Assembler messages:
start.S:22: Error: unrecognized opcode `custom0 2,x1,0,1'
start.S:23: Error: unrecognized opcode `custom0 3,x2,0,1'
start.S:28: Error: unrecognized opcode `custom0 x2,0,0,0'
start.S:31: Error: unrecognized opcode `custom0 x2,2,0,0'
start.S:34: Error: unrecognized opcode `custom0 x2,3,0,0'
start.S:77: Error: unrecognized opcode `custom0 a1,1,0,0'
start.S:88: Error: unrecognized opcode `custom0 0,x2,0,1'
start.S:91: Error: unrecognized opcode `custom0 1,x2,0,1'
start.S:94: Error: unrecognized opcode `custom0 2,x2,0,1'
start.S:126: Error: unrecognized opcode `custom0 x1,1,0,0'
start.S:127: Error: unrecognized opcode `custom0 x2,2,0,0'
start.S:129: Error: unrecognized opcode `custom0 0,0,0,2'

Does the custom0 opcode need to be patched into the binutils riscv-opc.c file?

IceStorm compatibility?


Wanted to reach out and see if this would be compatible with a BlackIce board? I'd love to use the IceStorm tools to get this running!

Thank you!

Synthesis simulation mismatch, when targeted for xc6slx9-2tqg144 using XILINX ISE 14.7

Hi Clifford

  1. Tried targeting the picorv32 to spartan6 board (papilio pro). Unfortunately the port didn't run on the
    target. On further investigation, noticed that while behavioral simulation worked, none of the post-
    translate, post-map or post-place and route simulations did not work as well. Seems to generate
    some trap after a few cycles! The result seems to be the same for both FAST_MEMRY=0 and 1!
Any pointers to debug this?
  1. Is there a GDB stub or as such that can be used to debug the firmware for example using a serial

riscv-gcc commit does not exist?

Hi Clifford -

Tried running your makefile / build scripts for the RISC-V GCC toolchain and got the following error on two computers:

Cloning into 'riscv-gcc'...
Checking connectivity... done.
fatal: reference is not a tree: 4fb4d8f9e9ac8a28d6ea5117688eadbcd0f7978e
Unable to checkout '4fb4d8f9e9ac8a28d6ea5117688eadbcd0f7978e' in submodule path 'riscv-gcc'
make[1]: *** [build-riscv32im-tools-bh] Error 1
make[1]: Leaving directory `/home/drichmond/Research/repositories/git/picorv32'
make: *** [build-riscv32im-tools] Error 2

Perhaps the commit tree changed recently? I was able to compile this code using your instructions several days ago.

Verilator generated executable didn't run far...

Using the following verilator commands:
verilator -Wno-lint -Wno-MULTIDRIVEN -trace --top-module picorv32_wrapper --cc testbench.v picorv32.v --exe cd obj_dir/ make -j -f
Got these error message:
TRAP after 8428 clock cycles ERROR! %Error: testbench.v:268: Verilog $stop Aborting... Aborted (core dumped)

Issue while Building the RV32I Toolchain

I was trying to build the RV32I toolchain and got an error at this step ../configure --with-arch=RV32I --prefix=/opt/riscv32i:

checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for grep that handles long lines and -e... /bin/grep
checking for fgrep... /bin/grep -F
checking for grep that handles long lines and -e... (cached) /bin/grep
checking for bash... /bin/bash
checking for __gmpz_init in -lgmp... yes
checking for mpfr_init in -lmpfr... yes
checking for mpc_init2 in -lmpc... yes
checking for curl... /usr/share/centrifydc/bin/curl
checking for wget... /usr/bin/wget
checking for ftp... /usr/bin/ftp
configure: error: Unknown arch

Did anyone else also face this problem?

decoded_imm_uj may cause confusion

I meet decoded_imm_uj first time, I think it used for both U-immediate and J-immediate. But it just use as J-immediate

// Here extract to form J-immediate and signed-extend it
{ decoded_imm_uj[31:20], decoded_imm_uj[10:1], decoded_imm_uj[11], decoded_imm_uj[19:12], decoded_imm_uj[0] } <= $signed({mem_rdata_latched[31:12], 1'b0});

decoded_imm_uj only use for JAL, and U-immediate just extract from instruction directly.

case (1'b1)
    decoded_imm <= decoded_imm_uj;
  |{instr_lui, instr_auipc}:
    decoded_imm <= mem_rdata_q[31:12] << 12;

From the above, I suggest that decoded_imm_uj should be decoded_imm_j.

Errors while compiling the testbench.v and picorv32.v in Modelsim

Hi Clifford,

I'm trying to implement your design PicoRV32 on FPGA, thus I need to compile both files in Modelsim for simulation. However, I met errors in both files:

for picorv32.v:
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(555): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(556): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(557): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(558): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(581): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(582): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(589): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.
** Error: C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\picorv32.v(590): A begin/end block was found with an empty body. This is permitted in SystemVerilog, but not permitted in Verilog. Please look for any stray semicolons.

as for the testbench.v:
** Error (suppressible): C:\Users\Cy\Desktop\za\ma\picorv32-master\picorv32-master\testbench.v(79): (vlog-2388) 'trap' already declared in this scope (picorv32_wrapper).

I copied the code in this part:
module picorv32_wrapper #(
parameter AXI_TEST = 0,
parameter VERBOSE = 0
) (
input clk,
input resetn,
output trap,
output trace_valid,
output [35:0] trace_data

wire trap;
wire tests_passed;
reg [31:0] irq;

As far as I know, in Verilog it's not allowed to define a internal wire or reg with the same name as the of a port. So maybe you can give me some thoughts?

Thanks a lot!

Best regards,

torture fails

Hi when i am trying to run torture it shows following error

  • test -f config.vh
  • test -f test.S
    ++ sed '/march=/ ! d; s,^// ,-,;' config.vh
  • riscv32-unknown-elf-gcc -m32 -march=RV32IMC -ffreestanding -nostdlib -Wl,-Bstatic,-T, -o test.elf test.S
    riscv32-unknown-elf-gcc: error: unrecognized command line option '-m32'
    make: *** [test] Error 1

Note : Give your suggestion to resolve this issue.

Example cxxdemo needs regs 16-31

The example code for cxxdemo is compiled using the rv32ic compiler, which I believe uses the full 32 registers of the instruction set. I had to enable the high regs in the testbench, otherwise it simply gave a TRAP exit.

With attached patch below test case passes.

csmith Fatal error

Hi when i am trying to run csmith & torture from scripts folder. I am getting the following fatal errors. Can you give your suggestions to resolve in this issues.
csmith error

---------------- 1 (1) ----------------
rm -f test.hex test.elf test.c test_ref test.ld output_ref.txt output_sim.txt
make spike test.hex
make[1]: Entering directory `/home/krradhak/picorv32/picorv32/scripts/csmith'
echo "integer size = 4" >
echo "pointer size = 4" >>
csmith --no-packed-struct -o test.c
gawk '/Seed:/ {print$2,$3;}' test.c
Seed: 1483867854
gcc -m32 -o test_ref -w -Os -I /home/krradhak/tools/csmith/src/csmith-AbsExtension.o test.c
test.c:10:20: fatal error: csmith.h: No such file or directory
 #include "csmith.h"
compilation terminated.
make[1]: *** [test_ref] Error 1
make[1]: Leaving directory `/home/krradhak/picorv32/picorv32/scripts/csmith'

Torture Error 👍

+ test -f config.vh
+ test -f test.S
++ sed '/march=/ ! d; s,^// ,-,;' config.vh
+ riscv32-unknown-elf-gcc -m32 -march=RV32IMC -ffreestanding -nostdlib -Wl,-Bstatic,-T, -o test.elf test.S line 22: riscv32-unknown-elf-gcc: command not found
make: *** [test] Error 127

Incorrect handling of SBREAK/ILLNSN with PCPI

I'm trying to put a minimal example of this together (or a patch if I get there quicker), but won't get a chance until tomorrow. I leave this here for now as a test of my own sanity.

If PCPI is enabled and an illegal instruction or sbreak occur, upon retirq, the signal pcpi_valid remains asserted. The eventual effect of this is that when the next valid PCPI instruction comes along, it causes another SBREAK IRQ to occur, presumably because the PCPI timeout never got reset.

I actually want to handle ecall, and I guess I could do it with another PCPI extension, but that seems like a less graceful solution.

Potential power savings

There are several cases in the code with constructs like this:

cpu_state_ld_rs1: begin 
    reg_op2 <= 'bx;

There are similar constructs for reg_out and div.pcpi_rd.

I assume that this construct exists to avoid the area of the feedback mux?

I'm wondering if the synthesis tools are smart enough to insert a clock gate for those registers instead of clocking out random data at each clock cycle.

IMO, instead of hoping that this would be the case, the more conservative way would be to remove the x assignment altogether. Feedback muxes don't really exist anymore and have been replaced by clock gates anyway, so the only extra logic cost would be the one to control the clock gate.

Also, a clock gate on alu_out_q would be nice as well.

jal instruction execution doesn't align with expectation

Hi, Clifford

When I simulate picorv32 with my system, the jal instruction does't jump the the expected address.

as you can see from the waveform, when the current pc is 0x512, cpu get a jal instruction, the value is 0xbfdd in the objdump:

512: bfdd j 508 <main+0x8>,

since main starts at 0x500, I expect the pc jump to 0x508, however, it jumps to 0x1508, the variable decoded_imm_uj is 0xff6, when it adds to 0x512, the result is 0x1508, should we just use the lower 12 bits to generate reg_next_pc?

Could you please have a look at this? Thank you very much!

log.txt is the simulation log when add DEBUY macro for picorv32
mem.txt is the memory initialization file



io constraints

Hi Clifford,
I am trying to implement and STA this block using opensource qflow and opentimer. Can you please help me know what are the IO constraints that needs to be used? Any pointer or pdf will help....I am using target clock frequency of 400MHz to start with....

LH instruction causes MISALIGNED HALFWORD error

I am trying to run picorv32 core with a bit more serious SW and hit below issue:

DECODE: 0x000066d8 0x00c59783 lh
LD_RS1: 11 0x00000113
TRAP after 1545 clock cycles

The code at address 0x66d8 is from riscv-newlib fflush() function (toolchain compiled as rv32i).
7784 000066b0 <_fflush_r>:
7785 66b0: fe010113 addi sp,sp,-32
7786 66b4: 00812c23 sw s0,24(sp)
7787 66b8: 00112e23 sw ra,28(sp)
7788 66bc: 00050413 mv s0,a0
7789 66c0: 00050c63 beqz a0,66d8 <_fflush_r+0x28>
7790 66c4: 03852783 lw a5,56(a0)
7791 66c8: 00079863 bnez a5,66d8 <_fflush_r+0x28>
7792 66cc: 00b12623 sw a1,12(sp)
7793 66d0: 188000ef jal ra,6858 <__sinit>
7794 66d4: 00c12583 lw a1,12(sp)
7795 66d8: 00c59783 lh a5,12(a1)
7796 66dc: 00078c63 beqz a5,66f4 <_fflush_r+0x44>
7797 66e0: 00040513 mv a0,s0
7798 66e4: 01812403 lw s0,24(sp)
7799 66e8: 01c12083 lw ra,28(sp)
7800 66ec: 02010113 addi sp,sp,32
7801 66f0: db9ff06f j 64a8 <__sflush_r>
7802 66f4: 01c12083 lw ra,28(sp)
7803 66f8: 01812403 lw s0,24(sp)
7804 66fc: 00000513 li a0,0
7805 6700: 02010113 addi sp,sp,32
7806 6704: 00008067 ret

Any idea?

Yanghao Hua

asm call needs volatile

Probably not an issue in your use case, but I had this piece of code compiled way because of the missing volatile:

unsigned int start_time = time();
unsigned int elapsed_time = start_time;
while (elapsed_time < MS_TO_CYCLES(100)){
    elapsed_time = time() - start_time;

volatile is also missing here:

Default picosoc register file doesn't support q regs


Today I discovered that the default PicoSoc configuration doesn't seem to support the q registers, which are used for saving registers in IRQ handlers.

The test case that I used was a basic IRQ handler in ASM. In my test case, this worked fine:


.. but this didn’t:

    // backup x10/x11 in q2/3
    picorv32_setq_insn(q2, x10)
    picorv32_setq_insn(q3, x11)

    // modify X10/X11
    addi x10, zero, 0
    addi x11, zero, 0

    // restore x10 and x11 from Q registers
    picorv32_getq_insn(x10, q2)
    picorv32_getq_insn(x11, q3)

    // return from IRQ

It turned out that the issue was in the default register bank provided by PicoSoc, and the fact that picorv32 uses registers r32-r35 for the Q registers. The default register bank truncates addresses to 5 bits, so the q registers don't work.

module picosoc_regs (
    input clk, wen,
    input [5:0] waddr,
    input [5:0] raddr1,    
    input [5:0] raddr2,
    input [31:0] wdata,
    output [31:0] rdata1,
    output [31:0] rdata2

reg [31:0] regs [0:31];
always @(posedge clk)
    if (wen) regs[waddr[4:0]] <= wdata;  // <---- address truncated

    assign rdata1 = regs[raddr1[4:0]];   // <---- address truncated
    assign rdata2 = regs[raddr2[4:0]];   // <---- address truncated

I've got a pull request ready to go - I'll send it through for review shortly.


dhrystone wrong

I have used the given dhrystone.Because I have no iverilog, I change it to vcs tool.But aftr nrue it,when I check the result wave,it just stop to read data at 18382 cycles, compute by 10010^6/(175718382) is too large.Do you know how to get the right dhrystone.

problems about the docs

when study the picorv32, some confused without documents, where can i find the descriptions or request for help on these?

  1. about the interrupt vector table description, from the De-assambled code of the example, seems the ram 0-F both for the reset handler, how should i arrange the vectors, or just like the one in original riscv?

  2. is there any Unit for Debug implementation(i mean the support for breakpoint, step...)? it should important for the fpga debug, while can not find in the readme.

  3. any PCPI detail descriptions? can not find the modules in the readme for pcpi functions, while the PCPI interface always toggle even the parameter PCPI closed.




I am targeting Alpine Linux.

It has been decided that the UNIX/Linux platform ABIs will mandate the A extension. The latest Linux port has removed kernel support for atomic emulation (the cmpxchg syscall). So A is now mandatory for the musl-riscv port given the Linux has removed the syscall.

Can you see picorv32 supporting the A extension?


Write Address and Data undefined with simple hello world program

I'm attempting to write a simple hello world function to run on picorv32 in sim. I'm running in the scripts cxxdemo folder. I can run make test just fine, but when I try to write a custom Hello World example I get an undefined write address and data: WR: ADDR=xxxxxxxX DATA=xxxxxxxx MASK=1111

 #include <stdio.h>
 #include <iostream>
 #include <vector>
 #include <algorithm>

 int main() {

	printf("Hello World!\n");

	return 0;

Makefile (modified of cxxdemo makefile)

RISCV_TOOLS_PREFIX = /opt/riscv32ic/bin/riscv32-unknown-elf-
CXXFLAGS = -MD -Os -Wall -std=c++11
CCFLAGS = -MD -Os -Wall -std=c++11
LDFLAGS = -Wl,--gc-sections
LDLIBS = -lstdc++

test: testbench.vvp firmware32.hex
	vvp -N testbench.vvp

hello: testbench_hello.vvp hello32.hex
	vvp -N testbench_hello.vvp

testbench.vvp: testbench.v ../../picorv32.v
	iverilog -o testbench.vvp testbench.v ../../picorv32.v
	chmod -x testbench.vvp

testbench_hello.vvp: testbench_hello.v ../../picorv32.v
	iverilog -o testbench_hello.vvp testbench_hello.v ../../picorv32.v
	chmod -x testbench_hello.vvp

firmware32.hex: firmware.elf start.elf
	$(RISCV_TOOLS_PREFIX)objcopy -O verilog start.elf start.tmp
	$(RISCV_TOOLS_PREFIX)objcopy -O verilog firmware.elf firmware.tmp
	cat start.tmp firmware.tmp > firmware.hex
	python3 firmware.hex > firmware32.hex
	rm -f start.tmp firmware.tmp

hello32.hex: hello.elf
	$(RISCV_TOOLS_PREFIX)objcopy -O verilog hello.elf hello.tmp
	cat hello.tmp > hello.hex
	python3 hello.hex > hello32.hex
	rm -f hello.tmp

firmware.elf: firmware.o syscalls.o
	$(CC) $(LDFLAGS) -o $@ $^ -T ../../firmware/riscv.ld $(LDLIBS)
	chmod -x firmware.elf

hello.elf: hello.o
	$(CC) $(LDFLAGS) -o $@ $^ -T ../../firmware/riscv.ld $(LDLIBS)
	chmod -x hello.elf

start.elf: start.S start.ld
	$(CC) -nostdlib -o start.elf start.S -T start.ld $(LDLIBS)
	chmod -x start.elf

	rm -f *.o *.d *.tmp start.elf
	rm -f firmware.elf firmware.hex firmware32.hex
	rm -f hello.elf hello.hex hello32.hex
	rm -f testbench.vvp testbench.vcd
	rm -f testbench_hello.vvp

-include *.d
.PHONY: test clean

testbench_hello.v (modified from testbench.v)

`timescale 1 ns / 1 ps
//`undef VERBOSE_MEM
//`undef WRITE_VCD
`define WRITE_VCD
`undef MEM8BIT

module testbench_hello;
	reg clk = 1;
	reg resetn = 0;
	wire trap;

	always #5 clk = ~clk;

	initial begin
		repeat (100) @(posedge clk);
		resetn <= 1;

	wire mem_valid;
	wire mem_instr;
	reg mem_ready;
	wire [31:0] mem_addr;
	wire [31:0] mem_wdata;
	wire [3:0] mem_wstrb;
	reg  [31:0] mem_rdata;

	picorv32 #(
	) uut (
		.clk         (clk        ),
		.resetn      (resetn     ),
		.trap        (trap       ),
		.mem_valid   (mem_valid  ),
		.mem_instr   (mem_instr  ),
		.mem_ready   (mem_ready  ),
		.mem_addr    (mem_addr   ),
		.mem_wdata   (mem_wdata  ),
		.mem_wstrb   (mem_wstrb  ),
		.mem_rdata   (mem_rdata  )

	localparam MEM_SIZE = 4*1024*1024;
`ifdef MEM8BIT
	reg [7:0] memory [0:MEM_SIZE-1];
	initial $readmemh("hello.hex", memory);
	reg [31:0] memory [0:MEM_SIZE/4-1];
	initial $readmemh("hello32.hex", memory);

	always @(posedge clk) begin
		mem_ready <= 0;
		if (mem_valid && !mem_ready) begin
			mem_ready <= 1;
			mem_rdata <= 'bx;
			case (1)
				mem_addr < MEM_SIZE: begin
`ifdef MEM8BIT
					if (|mem_wstrb) begin
						if (mem_wstrb[0]) memory[mem_addr + 0] <= mem_wdata[ 7: 0];
						if (mem_wstrb[1]) memory[mem_addr + 1] <= mem_wdata[15: 8];
						if (mem_wstrb[2]) memory[mem_addr + 2] <= mem_wdata[23:16];
						if (mem_wstrb[3]) memory[mem_addr + 3] <= mem_wdata[31:24];
					end else begin
						mem_rdata <= {memory[mem_addr+3], memory[mem_addr+2], memory[mem_addr+1], memory[mem_addr]};
					if (|mem_wstrb) begin
						if (mem_wstrb[0]) memory[mem_addr >> 2][ 7: 0] <= mem_wdata[ 7: 0];
						if (mem_wstrb[1]) memory[mem_addr >> 2][15: 8] <= mem_wdata[15: 8];
						if (mem_wstrb[2]) memory[mem_addr >> 2][23:16] <= mem_wdata[23:16];
						if (mem_wstrb[3]) memory[mem_addr >> 2][31:24] <= mem_wdata[31:24];
					end else begin
						mem_rdata <= memory[mem_addr >> 2];
				mem_addr == 32'h 1000_0000: begin
					$write("%c", mem_wdata[7:0]);
		if (mem_valid && mem_ready) begin
			if (|mem_wstrb)
				$display("WR: ADDR=%x DATA=%x MASK=%b", mem_addr, mem_wdata, mem_wstrb);
				$display("RD: ADDR=%x DATA=%x%s", mem_addr, mem_rdata, mem_instr ? " INSN" : "");
			if (^mem_addr === 1'bx ||
					(mem_wstrb[0] && ^mem_wdata[ 7: 0] == 1'bx) ||
					(mem_wstrb[1] && ^mem_wdata[15: 8] == 1'bx) ||
					(mem_wstrb[2] && ^mem_wdata[23:16] == 1'bx) ||
					(mem_wstrb[3] && ^mem_wdata[31:24] == 1'bx)) begin

`ifdef WRITE_VCD
	initial begin
		$dumpvars(0, testbench_hello);

	always @(posedge clk) begin
		if (resetn && trap) begin
			repeat (10) @(posedge clk);

I seem to get similar results when running in the firmware folder. But I don't get very far, where as this vcd appears to get many cycles in. Is it possible that there needs to be some kind of clean up code executed in order to exit correctly?

Xilinx ISE 14.7 synthesis

A few changes are needed for successful synthesis by the legacy Xilinx ISE toolchain:

  • xst incorrectly implements the register file with ENABLE_REGS_DUALPORT = 1, but ENABLE_REGS_DUALPORT = 0 works fine
  • xst doesn't like memory in an always @* combinational block
  • xst has some problems with parameterized macros

I did look at issues #2 and #25, and I'm a little puzzled as to how synthesis even completed, unless they made the same changes to the RTL as I did. I haven't seen problems with the PC register once the design is in runnable shape.

Attached is an example project targeting the Spartan 3E Starter board. It contains a picorv32 core with the plain memory interface, some block RAM, a UART, and some test software in C.

I'm just putting this out there in the hope it will make picorv32 more useful for older Spartan-6 or Spartan-3E designs; the main repository probably doesn't want to deal with irksome `ifdef OLD_XILINX stuff. I have some Spartan-6 hardware for which I'd like to use picorv32 (currently using picoblaze).


PicoRV32 Documents

HI, I am interested to learn PicoRV32, So Can you update Architecture & Pipeline, Datasheet for PicoRV32?

Possible waitirq stall

I've been debugging some IRQ related stuff I'm doing, and although this issue doesn't related to that, staring at this stuff did make me wonder.

The waitirq instruction will sit and wait for an unmasked interrupt to take place, and then store the pending interrupt list into rd, so you can see what interrupts were serviced upon progressing.

However, waitirq is dependent on irq_pending, which is has the IRQ mask applied to it (obviously).

But what happens if:

  1. IRQ is raised on a masked input.
  2. I execute maskirq to unmask it.
  3. My ISR is run.
  4. I then execute waitirq.

Perhaps the interrupt I was expecting got serviced between maskirq and waitirq, which could happen if it raised any time before waitirq gets executed, right?

Is there a defect in my reasoning here, or would a stall be possible? If this sounds sane, I can try to write a simulation example that causes it.

scall instruction causes trap instead of interrupt

I'm working on porting a rtos to picorv. scall doesn't seem to be implemented even dough it's part of the user level isa. Is there a reason for this? I think this would fix the problem.

instr_sbreak <= !CATCH_ILLINSN && ((mem_rdata_q[6:0] == 7'b1110011 && (mem_rdata_q[31:7] == 'b0000000000010000000000000) || mem_rdata_q[31:7] == 'b0000000000000000000000000)) ||
                    (COMPRESSED_ISA && mem_rdata_q[15:0] == 16'h9002));

very large clocked process makes it impossible to bring out non-clocked signals

In the current code, there are some huge clocked always clauses, like the one that starts on line 1168.

There is nothing fundamentally wrong with this as long as you don't need to bring out anything combinationally.

The problem is that this is what I'd like to do. :-)

The immediate issue is that, for one reason or the other, cpuregs is not detected by Quartus as a memory block, so it consumes tons of regular registers. (From your documentation, it seems that Vivado is detecting this just fine.)

I'd like to experiment by replacing constructs like this:
cpuregs[latched_rd] <= reg_pc + (latched_compr ? 2 : 4);

cpuregs_wr = 1'b1;
cpuregs_addr = latched_rd
cpuregs_wdata = reg_pc + (latched_compr ? 2 : 4);

and then have 1 place with a single cpuregs[cpuregs_addr] ... clause.

If that still doesn't do the trick, I'd even instantiate an Altera memory macro to force a memory block.

With the current clocked always block, such an experiment requires a full-out rewrite of the whole always block instead of a small surgical patch.

Would you be open to converting this clock process into 2 processes, a pure combinational one and a clock one that just contains the register? I'm willing to do the work if you just tell me the naming convention. My standard convention is:

<var_name>_nxt = ....

always @(posedge clk)
<var_name> <= <var_name>_nxt;

But I'm willing to use whatever way you prefer.

(I'm NOT asking to make the change for the cpuregs isolation itself, that could be a forked branch on my side of you think it makes the code look too ugly.)

In addition to cpuregs experiments, such a change would also make it possible to later insert logic to scan out register contents via jtag etc.

synthesis mismatch with the latest rtl on xilinx spartan6 fpga

Hi Clifford

Tried targeting the picorv32 to spartan6 (xc6slx25) board. The rtl simulation is ok, but FPGA didn't work, then I do a experiment to run the firmware you provided, it seems that the post-synthesis netlist have something wrong with the PC register, the PC add 0xC rather than 0x4 at certain point where I think there is no jumb instruction.

Could you help to have a look at this, Thank you very much!

In waveform bad_wave_netlist_sim, the 62 (start from 1) mem_axi_araddr is 0x780, while the previous one is 0x774, in the good wavefrom the mem_axi_araddr is 0x778, this is the mismatch point.

The picorv32_netlist.txt is the post synthesis netlist.

The testbench, firmware.hex and rtl is the latest one, with id: ef86b30



Will there be a PicoRV64?

I just think it would be cool if an RV64G design (Rocket?) could fit onto one of the ICE40 FPGAs so there was a relatively easy and open source way to hack on these. I think softfp would actually be fine for porting most software.

I just wanted to talk about this, really. It's not an issue per se.

Cannot find module SB_IO

I was trying to adapt the hx8kdemo for Zedboard, but when I try to construct the project in Vivado the module flash_io_buf doesn´t appears.

Unable to build testbench waveform in ubuntu

Tried building the testbench.vcd in ubuntu 12.04 with vivado,

getting error as below.

prashantravi@ubuntu:/picorv32/picorv32$ make testbench.vcd
iverilog -o testbench.exe testbench.v picorv32.v
picorv32.v:41: syntax error
I give up.
make: *** [testbench.exe] Error 2

Please help

Little error on line testbench.v:190


I'd like to report a possible syntax error on line testbench.v:190

output            mem_axi_rvalid = 0,

should be

output reg            mem_axi_rvalid = 0, 

This was tested using Simulation in Vivado.

Make fails

Hi when i am trying to Run make test or any make commands its produce the following error picorv32.v:1878: error: invalid module item.
picorv32.v:1879: syntax error
picorv32.v:1879: error: Invalid module instantiation
picorv32.v:1880: error: Invalid module instantiation
picorv32.v:1881: error: Invalid module instantiation
picorv32.v:1885: error: invalid module item.
picorv32.v:1887: syntax error
can you give guidance to correct the above mentioned issue.

Reset coding style

I just got started in building my own test bench for the
picorv32.v and noticed that there are 'bx values on the output
ports of picorv32 after the reset. Would it be a good coding
style in adding initial values for the FF output ports?

Question: Runtime code modification: Is it possible?

It's a crazy idea (which the answer is probably 'no') but I wanted to ask if it's possible to emit machine code at runtime (either in flash or SRAM)?

Sorry for the stupid question. I'm not looking for a complete example, I just want to know if it's possible. I know that picorv32 does not implement the RISC-V privileged mode and that's why I'm asking.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.