jlpteaching / dinocpu Goto Github PK

A teaching-focused RISC-V CPU design used at UC Davis

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.26% Scala 56.70% Shell 0.48% Java 10.44% Makefile 0.62% Assembly 0.11% C 29.49% Perl 1.88% Ruby 0.01%

dinocpu's Introduction

Davis In-Order (DINO) CPU models

This repository contains Chisel implementations of the CPU models from Patterson and Hennessy's Computer Organization and Design (RISC-V Edition) primarily for use in UC Davis's Computer Architecture course (ECS 154B).

Any other educators are highly encouraged to take this repository and modify it to meet the needs of your class. Please open an issue with any questions or feature requests. We would also appreciate contributions via pull requests!

We published a summary paper on DINO CPU at the Workshop on Computer Architecture Education held with ISCA '19.

Jason Lowe-Power and Christopher Nitta. 2019. The Davis In-Order (DINO) CPU: A Teaching-focused RISC-V CPU Design. In Workshop on Computer Architecture Education (WCAE’19), June 22, 2019, Phoenix, AZ, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3338698.3338892

The repository was originally cloned from https://github.com/ucb-bar/chisel-template.git.

Getting the code

To get the code, you can clone the repository that is in jlpteaching: jlpteaching/dinocpu.

git clone https://github.com/jlpteaching/dinocpu.git

Overview of code

The src/ directory:

main/scala/
- components/: This contains a number of components that are needed to implement a CPU. You will be filling in some missing pieces to these components in this lab. You can also find all of the interfaces between components defined in this file.
- pipelined/: This is the code for the pipelined CPU. Right now, this is just an empty template. You will implement this in Lab 4.
- single-cycle/: This is the code for the single cycle CPU. Right now, this is just an empty template. You will implement this in Lab 2.
- configuration.scala: Contains a simple class to configure the CPU. Do not modify.
- elaborate.scala: Contains a main function to output FIRRTL- and Verilog-based versions of the CPU design. You can use this file by executing runMain dinocpu.elaborate in sbt. More details below. Do not modify.
- simulate.scala: Contains a main function to simulate your CPU design. This simulator is written in Scala using the Treadle executing engine. You can execute the simulator by using runMain dinocpu.simulate from sbt. This will allow you to run real RISC-V binaries on your CPU design. More detail about this will be given in Lab 2. Do not modify.
- top.scala: A simple Chisel file that hooks up the memory to the CPU. Do not modify.
test/
- java/: This contains some Gradescope libraries for automated grading. Feel free to ignore.
- resources/riscv: Test RISC-V applications that we will use to test your CPU design and that you can use to test your CPU design.
- scala/
  - components/: Tests for the CPU components/modules. You may want to add additional tests here. Feel free to modify, but do not submit!
  - cpu-tests/: Tests the full CPU design. You may want to add additional tests here in future labs. Feel free to modify, but do not submit!
  - grading/: The tests that will be run on Gradescope. Note: these won't work unless you are running inside the Gradescope docker container. They should match the tests in components and cpu-tests. Do not modify. (You can modify, but it will be ignored when uploading to Gradescope.)

The documentation directory contains some documentation on the design of the DINO CPU as well as an introduction to the Chisel constructs required for the DINO CPU.

How to run

First you should set up the Singularity container or follow the documentation for installing Chisel.

There are three primary ways to interact with the DINO CPU code.

Running the DINO CPU tests.
Running the DINO CPU simulator.
Compiling the Chisel hardware description code into Verilog.

Compiling into Verilog

To compile your Chisel design into Verilog, you can run the elaborate main function and pass a parameter for which design you want to compile. As an example:

sbt:dinocpu> runMain dinocpu.elaborate single-cycle

The generated verilog will be available in the root folder as Top.v along with some meta-data. You may also get some generated verilog for auxillary devices like memory as Top.<device_name>.v

Compiling code for DINO CPU

See Compiling code to run on DINO CPU for details on how to compile baremetal RISC-V programs and compile full C applications.

Documentation for Teachers

DINO CPU-based assignments

The assignments directory contains some assignments that we have used at UC Davis with the DINO CPU.

Assignment 1: Introduction assignment which begins the design of the DINO CPU with implementing the R-type instructions only.
Assignment 2: A full implementation of a single-cycle RISC-V CPU. This assignment walks students through each type of RISC-V instruction.
Assignment 3: Pipelining. This assignment extends assignment 2 to a pipelined RISC-V design.
Assignment 4: Adding a branch predictor. In this assignment, students implement two different branch predictors and compare their performance.

dinocpu's People

Contributors

Stargazers

Watchers

dinocpu's Issues

Re-think the control unit

The control unit is not particularly intuitive. I threw it together as I was making the CPU. But now that I can see the whole design, some of the outputs don't make as much sense. This can be re-written.

Run/fix the RISC-V inst tests

There are a ton of tests for each RISC-V instruction in the RISC-V tests repo.

Unfortunately, because DINO CPU is missing some instructions (e.g., csr instructions) they don't run out of the box.

It might make sense to implement the host-guest interface from the risc-v tools, too, or to improve ours. Maybe this isn't related to this issue, though.

Update small application tests to end at the right time

Rather than continuing to execute past the end, we should check to make sure they end at the right cycle. The wrinkle here is that the branch predictors will mess this up.

Test coverage

Add coverage to testing suite. It might help create better tests.

Idea for lab 1

Have an input to the immediate generator that says what type of immediate to do. Have the students then generate all of the different immediates based on that instead of based on the opcode.

Fix DontCares

According to @nganjehloo the rule of thumb should be

This is OK

val wire = DontCare

But, the is not OK

wire := DontCare

It would be better if we can remove all of the DontCare. Explicitly using 0 would be better.

Update to current mainline chisel, etc.

Last quarter we used the snapshot versions of chisel, firrtl, iotesters, etc. This quarter we should try to use the mainline versions.

However, we depend on some 3.2 features, so we may need to keep using the snapshots.

See https://github.com/freechipsproject/chisel3/releases

Add special bubble symbol

Mostly for visualization and debugging.

Rather than using 0 for bubbles, let's use something else (e.g., BUBBLE)

Remove the five cycle design

This isn't used and, TBH, doesn't make much sense. It will also simplify the tests slight to remove it.
@DanG100, If you're working on the tests, this might be something to help out with :).

Re do all of the lab tests for instructions

Right now, the tests are a mess. It's lots of duplicated code between lab 2 and lab 3.

This should all be factored out into the InstTests and we should use the information in those tests to run the other tests.

auipc tests for lab 3

The AUIPC tests in lab 3 part 1 are confusing since some of them take more than one instruction.

We should probably move those to the multi cycle tests next time.

Or, we could have some "simple" multi cycle tests that don't require forwarding.

Add more workloads

It would be great if we could have bigger more interesting workloads. Using the RISC-V tests would be a good place to start.

Put on FPGA

This would also be really cool. Then, we could run much bigger workloads!

Create hierarchy in classes

We now have a lot of different classes and they are all in the dinocpu namespace!

We should take some constants, etc. (e.g., for the CSRs) and put them in their own namespace. We could also probably make a namespace for the memory interfaces.

Parameterize memory block sizes and modularize specific memory code

The memory is currently hardcoded to handle 32-bit block sizes. This is sufficient for our current purposes, but for involved implementations like caches which usually use 64-bit blocks we must be able to support arbitrary block sizes.

This will be best done with a blocksize parameter included in the CPU's configuration. This will be passed along with the latency into the backing memory and instruction/data memory ports' constructors, which initialize the inner wiring with the appropriate bit widths. I attempted to do this in one single side commit but there were clearly some aspects of the memory that required a more directed, thoughtful approach to parameterization.

We should also split the memory mask mode and sign-extension code into a function
and generalize the logic for any block size. At this moment the synchronous and async memory both use the same copy-pasted masking and sign-extension logic, and both instances are hardcoded to 32 bit block widths. Splitting this into its own function would make debugging and maintenance much easier for us.

Make it easier to specify config params on the command line

Right now, the code is duplicated in simulate and single step to take a command line name for a configuration and set up the CPUConfig(). See, for instance, https://github.com/jlpteaching/dinocpu-private/blob/master/src/main/scala/simulate.scala#L111.

We should put this code inside CPUConfig instead. Thus, it would also be possible to use the same syntax for specifying command line-based parameters to the elaborate function.

More detailed test output

It would be helpful if the tester outputted what didn't match for the test, for example "reg 5 was 0 expected 1234".

Add a pipeline stage register interface and module

Right now, especially for the non-combinational pipeline, there are way too many notions of what "bubbling" does to a stage's register. With the combinational pipeline bubbling "pauses" or "freezes" IF/ID and flushes the control signals of ID/EX and EX/MEM. With the non-combinational pipeline, however, it is likely necessary to have to bubble, freeze, and flush every stage of the pipeline, so what a bubbling operation actually does will get confusing for new users

I suggest to instead implement a new module that wraps these pipeline registers and provide a common interface for bubbling and flushing them. In particular, it should provide valid, freeze, and flush input signals, and a good output signal, as well as input and output bundles for the actual contents of the registers.

The valid signal should tell the module to write the contents of the input bundle into the register. This is necessary for delayed memory, as a valid memory response is not immediately guaranteed and so we would have to watch for garbage input data.

The freeze signal should tell the module to ignore the valid signal. This has the effect of bubbling the register, similar to the IF/ID stage.

Lastly, the flush signal should have the module zero out the contents of the register on the next cycle. For compatibility this would zero out just the control signals of ID/EX and EX/MEM, but a simpler implementation would be to split these stage registers into two modules (one for control signals, the other for data signals). This approach also allows more granularity in the a bubbling operation: we could freeze the control signals, but zero out the data, or vice versa.

Make disassembler

Instead of printing "DASM(inst)" for the spike-dsm program, it would be better to disassemble in scala. Since we're just doing a subset of things, this should be pretty easy.

Fix Treadle output directory bug

At the moment, setting the output directory of the simulators results in compilation errors due to a bug within Treadle. A temporary fix in #82 was to comment out the line which changed the output directory, but then this pollutes the current working directory with testing files.

Unless this bug is ultimately intentional and we need to refactor the Driver code to instead use FIRRTLMain we will have to pay attention to when the bug is fixed by the Chisel guys.

Add I/O information to the register file

Update readme

This is woefully out of date with lots of old information. This needs to be updated.

Add option for single stepper to print the register files values

We could have another command say "p" to dump the current state of the register file.

In fact, this could be significantly expanded to print anything in the CPU. This should "just work" in theory.

Make output of branch control unit two bit

This output should be:
0->use pc+4
1->use pc+imm
2->use alu result

We would also need the two jump bits on the input of the branch control.

Push through ASIC flow and document

It would be awesome to have scripts to run through place and route so we can get area and cycle time.

Add a sw forwarding test for lab 3

We should have a test before the application tests that makes sure that sw data is forwarded correctly.

Make sure debug printing is consistent

Related to #23

Currently, when you run a single cycle in the single stepper, it's not clear whether you're seeing the output from before or after the cycle has happened. To make matters worse, I believe that some components print the "old" value and some the "new" value. Specifically, I believe registers and wires have different behavior.

We need to investigate this to figure out exactly what's going on. One solution would be to remove the printfs in the Chisel code and do everything from the Single stepper. Also, getting #23 would probably go a long way towards fixing this issue.

@cjnitta, anything to add to this?

Email from Chris:

I'm trying to help a student debug their circuit, and there seems to be
a delay with the update of wires by one cycle. I have a similar setup to
them, but I ran:
runMain dinocpu.singlestep addi2 pipelined
It is on the write data to the register file, and the toreg. I see it in
the bundle on cycle 4 to be 17 and 0, but don't see the update of
write_data until a cycle later. Below is the relevant code, and the
output from my single stepping. Is there a reason wires are delayed a
cycle when in regards to the printing?

Thanks,
Chris

val write_data = Wire(UInt(32.W))
...
when (mem_wb.wbcontrol.toreg === 1.U) {
     write_data := mem_wb.readdata
   } .elsewhen (mem_wb.wbcontrol.toreg === 2.U) {
     write_data := mem_wb.wbcontrol.pcplusfour
   } .otherwise {
     write_data := mem_wb.result
   }
...
   printf("writedata: %d\ntoreg: %d\n", write_data, mem_wb.wbcontrol.toreg)


Cycles > 1
Cycle=1 IF/ID: Bundle(instruction -> 17827091, pc -> 0, pcplusfour -> 4)
DASM(1100513)
ID/EX: Bundle(excontrol -> Bundle(pc -> 0, sextImm -> 0, readdata1 -> 0, 
readdata2 -> 0, rs1 -> 0, rs2 -> 0, funct7 -> 0, add -> 0, immediate -> 
0, alusrc1 -> 0, branch -> 0, jump -> 0), mcontrol -> Bundle(memread -> 
0, memwrite -> 0, funct3 -> 0), wbcontrol -> Bundle(pcplusfour -> 0, 
toreg -> 3, regwrite -> 0, rd -> 0))
EX/MEM: Bundle(brtarget -> 0, brtaken -> 0, result -> 0, writedata -> 0, 
mcontrol -> Bundle(memread -> 0, memwrite -> 0, funct3 -> 0), wbcontrol 
-> Bundle(pcplusfour -> 0, toreg -> 0, regwrite -> 0, rd -> 0))
MEM/WB: Bundle(result -> 0, readdata -> 19, wbcontrol -> 
Bundle(pcplusfour -> 0, toreg -> 0, regwrite -> 0, rd -> 0))
writedata: 0
toreg: 0
---------------------------------------------
Cycles > 1
Cycle=2 IF/ID: Bundle(instruction -> 80020883, pc -> 4, pcplusfour -> 8)
DASM(4c50593)
ID/EX: Bundle(excontrol -> Bundle(pc -> 0, sextImm -> 17, readdata1 -> 
0, readdata2 -> 0, rs1 -> 0, rs2 -> 17, funct7 -> 0, add -> 0, immediate 
-> 1, alusrc1 -> 0, branch -> 0, jump -> 0), mcontrol -> Bundle(memread 
-> 0, memwrite -> 0, funct3 -> 0), wbcontrol -> Bundle(pcplusfour -> 4, 
toreg -> 0, regwrite -> 1, rd -> 10))
EX/MEM: Bundle(brtarget -> 0, brtaken -> 0, result -> 0, writedata -> 0, 
mcontrol -> Bundle(memread -> 0, memwrite -> 0, funct3 -> 0), wbcontrol 
-> Bundle(pcplusfour -> 0, toreg -> 3, regwrite -> 0, rd -> 0))
MEM/WB: Bundle(result -> 0, readdata -> 19, wbcontrol -> 
Bundle(pcplusfour -> 0, toreg -> 0, regwrite -> 0, rd -> 0))
writedata: 0
toreg: 0
---------------------------------------------
Cycles > 1
Cycle=3 IF/ID: Bundle(instruction -> 19, pc -> 8, pcplusfour -> 12)
DASM(13)
ID/EX: Bundle(excontrol -> Bundle(pc -> 4, sextImm -> 76, readdata1 -> 
0, readdata2 -> 0, rs1 -> 10, rs2 -> 12, funct7 -> 2, add -> 0, 
immediate -> 1, alusrc1 -> 0, branch -> 0, jump -> 0), mcontrol -> 
Bundle(memread -> 0, memwrite -> 0, funct3 -> 0), wbcontrol -> 
Bundle(pcplusfour -> 8, toreg -> 0, regwrite -> 1, rd -> 11))
EX/MEM: Bundle(brtarget -> 16, brtaken -> 0, result -> 17, writedata -> 
0, mcontrol -> Bundle(memread -> 0, memwrite -> 0, funct3 -> 0), 
wbcontrol -> Bundle(pcplusfour -> 4, toreg -> 0, regwrite -> 1, rd -> 10))
MEM/WB: Bundle(result -> 0, readdata -> 19, wbcontrol -> 
Bundle(pcplusfour -> 0, toreg -> 3, regwrite -> 0, rd -> 0))
writedata: 0
toreg: 3
---------------------------------------------
Cycles > 1
Cycle=4 IF/ID: Bundle(instruction -> 19, pc -> 12, pcplusfour -> 16)
DASM(13)
ID/EX: Bundle(excontrol -> Bundle(pc -> 8, sextImm -> 0, readdata1 -> 0, 
readdata2 -> 0, rs1 -> 0, rs2 -> 0, funct7 -> 0, add -> 0, immediate -> 
1, alusrc1 -> 0, branch -> 0, jump -> 0), mcontrol -> Bundle(memread -> 
0, memwrite -> 0, funct3 -> 0), wbcontrol -> Bundle(pcplusfour -> 12, 
toreg -> 0, regwrite -> 1, rd -> 0))
EX/MEM: Bundle(brtarget -> 92, brtaken -> 0, result -> 93, writedata -> 
0, mcontrol -> Bundle(memread -> 0, memwrite -> 0, funct3 -> 0), 
wbcontrol -> Bundle(pcplusfour -> 8, toreg -> 0, regwrite -> 1, rd -> 11))
MEM/WB: Bundle(result -> 17, readdata -> 19, wbcontrol -> 
Bundle(pcplusfour -> 4, toreg -> 0, regwrite -> 1, rd -> 10))
writedata: 0
toreg: 0
---------------------------------------------
Cycles > 1
Cycle=5 IF/ID: Bundle(instruction -> 19, pc -> 16, pcplusfour -> 20)
DASM(13)
ID/EX: Bundle(excontrol -> Bundle(pc -> 12, sextImm -> 0, readdata1 -> 
0, readdata2 -> 0, rs1 -> 0, rs2 -> 0, funct7 -> 0, add -> 0, immediate 
-> 1, alusrc1 -> 0, branch -> 0, jump -> 0), mcontrol -> Bundle(memread 
-> 0, memwrite -> 0, funct3 -> 0), wbcontrol -> Bundle(pcplusfour -> 16, 
toreg -> 0, regwrite -> 1, rd -> 0))
EX/MEM: Bundle(brtarget -> 0, brtaken -> 0, result -> 0, writedata -> 0, 
mcontrol -> Bundle(memread -> 0, memwrite -> 0, funct3 -> 0), wbcontrol 
-> Bundle(pcplusfour -> 12, toreg -> 0, regwrite -> 1, rd -> 0))
MEM/WB: Bundle(result -> 93, readdata -> 0, wbcontrol -> 
Bundle(pcplusfour -> 8, toreg -> 0, regwrite -> 1, rd -> 11))
writedata: 17
toreg: 0
---------------------------------------------
Cycles > q

For the pipeline diagram want control bundles

To better match the book, we want to show the ex control, wb control, and mem control in the diagram.

Duplicate Test Definitions

Lab1Test.scala and Lab2Test.scala redefine the tests that are already present in InstTests.scala.

Move pipeline code to dinocpu.pipelined package

This should be simple to tackle - to close #86 and help make the dinocpu namespace cleaner, the pipelined CPU models and stage register should all go into a dinocpu.pipelined package.

Multi-issue

This should be pretty straightforward to add multi-issue support.

Update the fetch logic to get more data
Update the hazard detection logic
Add more ports to the register file (or duplicate it?)
Fix the forwarding logic
Add new hazards (structural). E.g., only one instruction can access data memory at a time
Add another ALU
All of the details I'm not thinking about

Add example single stepping with register

Add an example so students see how the registers and wires behave when single stepping.

Running single test packages does not work as documented

In Lab1Test.scala, the documentation says to run sbt 'testOnly dinocpu.SingleCycleMultiCycleTesterLab1'
This commands returns that no tests were run

sbt 'testOnly dinocpu.SingleCycleMultiCycleTesterLab1'
[info] Loading global plugins from /home/daniel/.sbt/1.0/plugins
[info] Loading settings for project dinocpu-build from plugins.sbt ...
[info] Loading project definition from /home/daniel/Documents/dinocpu/project
[info] Loading settings for project root from build.sbt ...
[info] Set current project to dinocpu (in build file:/home/daniel/Documents/dinocpu/)
[info] ScalaTest
[info] Run completed in 39 milliseconds.
[info] Total number of tests run: 0
[info] Suites: completed 0, aborted 0
[info] Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
[info] No tests were executed.
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for Test / testOnly
[success] Total time: 2 s, completed May 3, 2019 10:12:58 AM

Clean things up for DINO CPU release

Copy documentation from class page
Clean up documentation
Split out the pipelined CPU from branch predictor (#61)
Add in assignment descriptions
go through issues to see if any should be closed
Better document the new single stepper REPL thing

Other issues to look at

Bug in reading bytes/halfwords

From @cjnitta:

Jason,
A student pointed out that there might be a bug in the memory.scala.
Starting on line 73, it might need to be:

     when (maskmode =/= 2.U) { // When not loading a whole word
       val offset = io.dmem.address(1,0)

       when (maskmode === 0.U) { // Reading a byte
         readdata := (memory(io.dmem.address >> 2) >> (offset * 8.U)) & 
0xff.U
       } .otherwise {
         readdata := (memory(io.dmem.address >> 2) >> ((offset & 0x2.U) 
* 8.U)) & 0xffff.U
       }
     } .otherwise {
       readdata := memory(io.dmem.address >> 2)
     }

The difference was the shifting down of the byte or half word. It didn't
seem to be a problem in the first 3 projects so far.

Chris

I think both lines could be >> (offset * 8.U) since the halfword must be 16-bit aligned.

Add branch forward test for lab 3

This was another common error on lab 3.

Add csr instructions

For the class, we ignored the following instructions. If we added them, it would increase the number of workloads we could run and make it easier to use GCC to compile.

We might be able to get away with adding a "CSR Unit" to hide most of this complexity.
Now that I'm thinking about it, we might be able to do this for the class version, too!

Add cache

There are many steps to this that should be their own issues, but I'm going to put everything here for now.

Add a non-combinational memory that takes a configurable number of cycles (#42)
Update combinational memory to use the same interface as the non-combinational
Document new interface as shown in #78 (review)
Factor out function for dmem masking
Update pipeline to be able to use the async memory
Write a simple direct-mapped write-through cache
Make the cache set associative (configurable)
Make the cache write back (configurable)
Make the block size configurable (related to #65)

This would be a great future assignment as well.

Add checks for unaligned/illegal memory accesses

Relates to #34. We can also add a wire back to the CSR/interrupt unit to generated an interrupt in this case.

Alternatively, we could check for this error condition in the CSR/interrupt controller. Maybe that would be better, in fact.

Minor bug with included synchronous memory write test

While writing up the asynchronous memory tests, I noticed in the normal memory unit test the "store and load words" test is using the MemoryUnitReadTester when it seems that it should be using MemoryUnitWriteTester.

I corrected this in jardhu/dinocpu@55fbcc9 but the test failed when executed:

[info] [0.000] SEED 1555963536079
[info] [0.012] EXPECT AT 257   io_imem_instruction got 355 expected 100 FAIL
[info] [0.012] EXPECT AT 257   io_dmem_readdata got 355 expected 100 FAIL
[info] [0.013] EXPECT AT 258   io_imem_instruction got 355 expected 101 FAIL
[info] [0.013] EXPECT AT 258   io_dmem_readdata got 355 expected 101 FAIL
[info] [0.013] EXPECT AT 259   io_imem_instruction got 355 expected 102 FAIL
[info] [0.013] EXPECT AT 259   io_dmem_readdata got 355 expected 102 FAIL
...
[info] [0.060] EXPECT AT 766   io_imem_instruction got 355 expected 609 FAIL
[info] [0.060] EXPECT AT 766   io_dmem_readdata got 355 expected 609 FAIL
[info] [0.060] EXPECT AT 767   io_imem_instruction got 355 expected 610 FAIL
[info] [0.060] EXPECT AT 767   io_dmem_readdata got 355 expected 610 FAIL
[info] [0.060] EXPECT AT 768   io_imem_instruction got 355 expected 611 FAIL
[info] [0.060] EXPECT AT 768   io_dmem_readdata got 355 expected 611 FAIL
test DualPortedMemory Success: 2 tests passed in 773 cycles in 0.061594 seconds 12549.95 Hz
[info] [0.060] RAN 768 CYCLES FAILED FIRST AT CYCLE 257
[info] MemoryTester:
[info] DualPortedMemory
[info] - should have all zeros (with treadle)
[info] DualPortedMemory
[info] - should have increasing words (with treadle)
[info] DualPortedMemory
[info] - should store and load words (with treadle) *** FAILED ***
[info]   false was not true (MemoryUnitTest.scala:95)

It couldn't possibly be the memory module or else we would've had lots of problems involving writing to memory during WQ19, so it must be the unit test itself. Right now I'm working on fixing it but I didn't expect the test to just flat out fail, which is why I'm writing up this issue in advance.

Hook disassembler into new REPL interface

Use Chisel release 3.2.0

https://github.com/freechipsproject/chisel3/releases/tag/v3.2.0

peeking and poking csr signals for testing doesnt work because csr io and regfile is renamed in testing unfriendly way

CSR testing needs to be looked into. After building the register file contents are renamed and thus using peek and poke becomes impractical.

Go through all conditions and make logical vs bitwise consistent

When using logical not (!) vs bitwise not (~), and (&& vs &), etc. this should be consistent whether it's logical vs bitwise.

Move getting started from lab1 document to this repo

One of the major problems students had was knowing where to look for documentation. We should have all of the DINO CPU documentation here, not in the lab1 document.

Add ebreak and ecall instructions

These instructions raise a breakpoint exception for the debugger and make a request to the execution environment by raising an Environment Call exception.

We might be able to integrate this into the #34 CSR unit since it's the same opcode.

This would also make adding more workloads #29 easier, and potentially allow us to make a better host-guest interface.

Implementing ecall would also allow us to get printf working!

split branch prediction CPU into its own file

It would be better to have a separate file for the branch predictor CPU data path than replacing the simple pipelined CPU data path. It's a little unfortunate to have lots of code duplication, but, overall, I think it's better to split this out on its own.

I foresee having many different pipeline designs in the pipelined/ directory:

cpu-simple.scala: Base pipelined design (e.g., answer to lab3)
cpu-bp.scala: Pipelined design with a branch predictor
cpu-seqmem.scala: Pipelined design with sequential instead of asynchronous memory.
Others????

Travis CI fails builds due to java.lang.NoClassDefFoundError

Seems like every PR is going to fail the Travis CI check now due to this error occurring in every single unit test. Here's an example with my asynchronous memory PR. Some kind of misconfigured dependency messing with the Apache Commons Text library, maybe?

It only started happening after #55 was merged into master, so @DanG100 could you look at this?

Lab1 Tests: Split out add0 into its own test

Next time, add0 should be its own test so you can get credit for the rest of the tests passing even if you don't deal with the zero register correctly.

Create different InsnTest cases for non-combinational mem + pipelined CPU

Right now, since the tests were built with only combinational memory on hand, the InsnTests assume that the CPUs use combinational memory. This poses a problem for testing the pipelined CPU with the non-combinational memory. Since these are delayed the execution of these tests inherently lasts longer, and the short cycle count the tests dedicate to the simulator do not allow the CPUs to finish execution.

To remedy this, the testing suites should be revised to include a combinational and non-combinational memory section, which assigns the memType and memPortType parameters in the configuration file appropriately. This will permit the tester drivers to thoroughly test the non-combinational pipelined + pipelined with branch predictor CPU models.

I do not know if there is an exact formula that determines how many cycles the non-combin pipeline should take, though after messing with CPUTesterDriver I found that multiplying the number of combin cycles by the delay suffices for all of the tests to complete.