What is an accumulator based CPU machine

CPU design with VHDL

You are here: »» »» (overview)

Tester: Marc Reichenbach
Assessor: Konrad Häublein

Preparation / general

In preparation, I went through the slides several times. Towards the end, the lecture includes a small project that I had worked on with my partner before the exam, which was quite helpful when repeating the VHDL. Otherwise I looked again at the CPU from the exercise. For the actual exam, the blackboard sketches and statements Marc made during the lecture are very important, so I worked through my notes again.

Examination with pen and paper. I had to both write some VHDL and sketch out RTL schematics and a minimal CPU. The questions were only asked by the examiner. Marc looks slightly critical the whole time, even if you say the right thing, you shouldn't let that unsettle you. The atmosphere was pleasant.

Examination process

The test was roughly 50/50 split between VHDL and CPU design, but Marc went over the top with VHDL. In general, the exam was very similar to the other exam report from this semester.

VHDL

Generally

Q: What is VHDL?

  • Specification and modeling language with parts that can be synthesized

Q: Synthetic proportions. How do I describe that?

  • Structure, data flow, process description

Q: How is a normal VHDL file structured?

  • Entity description ("interface")
  • Architecture ("functional behavior")
  • Configuration ("selection of a specific architecture")

Q: Libraries. What do you use that is important, and do you actually always need it?

  • IEEE standard lib (exact name was not asked). Standard types, e.g. std_logic, std_logic_vector.
  • On request: 9-value logic, why do you want that ...

Q: Please write a counter in VHDL.

signal count: std_logic_vector (...); - depending on the process width (clk, reset) beginif reset = '1' then-- asynchronous reset count <= "00 ..."; elsif clk'eventand clk = '1 'then-- we want a clocked memory count <= count +1; endif; endprocess;

Q: Please paint on the RTL.

-------------------------------- | | | | | ----- | | ---------- | | | - | | | | | | + | -------- | Register | - | '1' - | | | | | ----- | | ----> | | | | | | ---------- | | | clk reset

Q: How fast can the counter clock now?

  • Depending on the speed of the adder
  • → critical path: where does it go through, what does it depend on (width of the adder, briefly carry-ripple vs carry-lookahead explained)

FPGA

Q: How does VHDL get onto the FPGA

  • Implementation (especially the difference between mapping and place & root was required)
    • Translation (linker for net lists, in addition transition from Unisim → Simprim, i.e. simulation components → high level FPGA description)
    • Mapping (which elements for the respective components, e.g. an AND → LUTs)
    • Place & Root (which LUT of the FPGA and how to connect)

Functional simulation

Q: Why do you want to simulate?

  • Difficult to look into hardware, vibrations, semi-stable signals ...
  • Building hardware is expensive / too expensive and tedious for "quick testing"

Q: How does this work?

  • Event-driven simulation (why do you do this? → discretization)
    • Event lists on signals with timestamps for the events / transactions
    • Delta cycles for solving parallel assignments, since CPUs are sequential → "logical clock"

CPU

Q: Now we have such a counter. What can it be used for?

Single cycle CPU

Q: Elements of simple CPU? (+ draw in and roughly "wire")

  • + Decoder, addition units for PC and branches, ...

Q: Also possible without RegFile?

  • Yes, you could also use SRAM / BRAM
  • Stack or accumulator based architecture

Q: Explain the data paths for each of the following instructions:

add $ 1, $ 2, $ 3 # General sw $ 5, 16 ($ 4) # Looping register value past ALU add $ 4, $ 1, $ 2, $ 3 # Add with three registers

Q: Is instruction three possible?

  • Expected answer: No, because it doesn't work with our exercise CPU
    • Regfile with three read inputs, ALU with three inputs or
    • Assembler macro, which adds two and a temporary register (reserved for it) "translates"

Pipelining

Due to time constraints, the general part was skipped (draw in steps, etc.).

Q: We want the ALU pipelines. What do you have to consider

  • Depending on the thickness of the ALU
  • Adder, Mult, Cordic make great pipelines
  • But: barrel shifter can be pipelines but forwarding is not possible because all bits can change in every step (on request I had to paint a barrel shifter)

Q: Where can a structural hazard occur in our CPU?

  • Optional if the instruction memory and data memory are identical
  • Solution: Move stalls or to a level further below (e.g. caches)

Q: Please record the solution with the layer

  • I $, D $, arbiter, L2- $, RAM

Q: How would you build such an arbiter?

  • Many possibilities: Simple: operate the state machine and D $ before I $
/ ------------ \ | -----------------> | idle | <------------------------------------- | | \ ------------ / | | / \ | | / \ | | i $ req &&! d $ req / \ d $ req | | v v | | / --------- \ / --------- \ | | | i $ fetch | | d $ fetch | | | \ --------- / \ --------- / | | ... ... <- Get the data from memory | | / --------- \ / --------- \ | | | i $ fin | | d $ fin | | | \ --------- / \ --------- / | | | | | | ---------------- | | ------------------------------------ |
  • Is that possible more minimally? Yes, combine d $ fetch and idle. But then useless traffic on the memory bus → e.g. idiotic with multicore.