Git Product home page Git Product logo

Comments (35)

sylefeb avatar sylefeb commented on June 11, 2024 1

Hi Rob, Thanks for the tests!

Yes, this will be completely automated. I mentioned the rewrite as a current way to avoid the extra cycle. Last night I pushed in draft a prototype that implements this cycle-skip and also outputs a change-log report in the console. The report points out everywhere the change possibly has an effect. This should help track potential issues.

Please consider it still preliminary and work in progress, I am hoping to do more thorough experiments in the coming evenings.

I am also looking into other interesting similar cases relating to if/else. Working on it but this requires a bit more subtlety.

Thanks everyone!
Sylvain

from silice.

sylefeb avatar sylefeb commented on June 11, 2024 1

Yes, agreed. This is indeed the new behavior with latest changes in draft:

while (C) {
  __display("in loop");
}
__display("after");

Now gives (formatted for clarity):

2: begin
  // __while__block_1
  if (_c_C) begin
    // __block_2
    // __block_4
    $display("in loop");
    // __block_5
    _d__idx_fsm0 = 2;
  end else begin
    // __block_3
    $display("after");
    _d__idx_fsm0 = 0;
  end
end

Note how __display("after"); has been collapsed into the loop conditional.
This is the same as:

while (1) {
  if (C) {
    __display("in loop");
  } else {
    __display("after");
    break;
  }
}

which gives (as before):

2: begin
// __while__block_1
if (1) begin
  // __block_2
  // __block_4
  if (_c_C) begin
    // __block_5
    // __block_7
    $display("in loop");  
    // __block_8
    // __block_12
    // __block_13
    _d__idx_fsm0 = 2;
  end else begin
    // __block_6
    // __block_9
    $display("after");
    _d__idx_fsm0 = 0;
  end
end else begin
  _d__idx_fsm0 = 0;
end
end

Both are now exactly the same result (ignoring the if (1) that has no effect).

Regarding the waiting loop:

  __display("before");
  while (C) { __display("waiting"); }
  __display("thing done after");
++:
  __display("a later state");

Now gives (again formatted for clarity):

1: begin
  // _top
  $display("before");
  _d__idx_fsm0 = 2;
end
2: begin
  // __while__block_1
  if (_c_C) begin
    // __block_2
    // __block_4
    $display("waiting");
    // __block_5
    _d__idx_fsm0 = 2;
  end else begin
    // __block_3
    $display("thing done after");
    _d__idx_fsm0 = 3;
  end
end
3: begin
  // __block_6
  $display("a later state");
  _d__idx_fsm0 = 0;
end

Note how __display("thing done after"); is now in the else part of the loop conditional, it will be reached without any intermediate cycle when C becomes false. I however do need at least one state for the while loop unless Silice starts to allow duplicating design parts -- which is possible but would be hard to predict for the user (and potentially a hefty hidden burden on fpga resources, keeping in mind the duplications might cascade).

Why the need to duplicate? Silice would have to automatically rewrite as such:

__display("before");
if (C) {
  while (1) { 
    if (C) { __display("waiting"); }
    else { __display("thing done after"); /*copy1*/}
  }
} else {
  __display("thing done after"); /*copy2*/
}

This is possible to do explicitly 'by hand' if the code duplication is acceptable (there is btw a related case in the documentation). Everything up the the next state has to be duplicated: there should be nothing left between after the if/else and the next state.

I am exploring other such improvements.

@rob-ng15 was the change log output useful? it should accurately tell which loops are being changed (and those not mentioned should behave the same as before).

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024 1

@sylefeb Yes I did have a look at the change log messages, they directed me to pretty much where I'd expect the affected while loops would be. There was one message pointing towards a unit with no while loops.

I'm happy for these changes to go ahead, and I'll adjust PAWSv2, probably just the RV64 to accommodate.

Rob

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024 1

This is great news and to me the best that can be done is to add options for optimization levels. If they can be passed by command line, it would be good. If they can be overridden per-function, it would be better.

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024 1

Hi @sylefeb

Many thanks. The SD CARD I am using is read ok, as of this morning.

I'll try the updated changes tomorrow and let you know.

Rob.

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024

I think a good way to save many cycles is as follows.
In the following code, there are many else blocks that the only thing they do is to set next state:

module M_uart_tx (
in_data,
in_clock_counter,
out_tx_pin,
out_out_valid,
in_run,
out_done,
reset,
out_clock,
clock
);
input  [7:0] in_data;
input signed [31:0] in_clock_counter;
output  [0:0] out_tx_pin;
output  [0:0] out_out_valid;
input in_run;
output out_done;
input reset;
output out_clock;
input clock;
assign out_clock = clock;

reg signed [31:0] _d_t1;
reg signed [31:0] _q_t1;
reg  [7:0] _d___block_3_mask;
reg  [7:0] _q___block_3_mask;
reg  [0:0] _d_tx_pin;
reg  [0:0] _q_tx_pin;
reg  [0:0] _d_out_valid;
reg  [0:0] _q_out_valid;
reg  [3:0] _d__idx_fsm0,_q__idx_fsm0;
assign out_tx_pin = _q_tx_pin;
assign out_out_valid = _q_out_valid;
assign out_done = (_q__idx_fsm0 == 0)
;



`ifdef FORMAL
initial begin
assume(reset);
end
assume property($initstate || (in_run || out_done));
`endif
always @* begin
_d_t1 = _q_t1;
_d___block_3_mask = _q___block_3_mask;
_d_tx_pin = _q_tx_pin;
_d_out_valid = _q_out_valid;
_d__idx_fsm0 = _q__idx_fsm0;
// _always_pre
(* full_case *)
case (_q__idx_fsm0)
1: begin
// _top
_d_t1 = in_clock_counter+1;

_d_tx_pin = 0;

_d_out_valid = 1;

_d__idx_fsm0 = 2;
end
2: begin
// __while__block_1
if (1&&in_clock_counter-_q_t1<0) begin
// __block_2
// __block_4
_d_out_valid = 0;

// __block_5
_d__idx_fsm0 = 2;
end else begin
_d__idx_fsm0 = 3;
end
end
3: begin
// __block_3
_d_t1 = _q_t1+1;

_d___block_3_mask = 1;

_d__idx_fsm0 = 4;
end
4: begin
// __while__block_6
if (_q___block_3_mask!=0) begin
// __block_7
// __block_9
// __block_10
_d_tx_pin = (in_data&_q___block_3_mask)!=0;

_d_out_valid = 1;

_d__idx_fsm0 = 8;
end else begin
_d__idx_fsm0 = 5;
end
end
8: begin
// __while__block_11
if (1&&in_clock_counter-_q_t1<0) begin
// __block_12
// __block_14
_d_out_valid = 0;

// __block_15
_d__idx_fsm0 = 8;
end else begin
_d__idx_fsm0 = 9;
end
end
5: begin
// __block_8
_d_tx_pin = 1;

_d_out_valid = 1;

_d__idx_fsm0 = 6;
end
9: begin
// __block_13
_d_t1 = _q_t1+1;

// __block_16
_d___block_3_mask = _q___block_3_mask<<1;

// __block_17
_d__idx_fsm0 = 4;
end
6: begin
// __while__block_18
if (1&&in_clock_counter-_q_t1<0) begin
// __block_19
// __block_21
_d_out_valid = 0;

// __block_22
_d__idx_fsm0 = 6;
end else begin
_d__idx_fsm0 = 0;
end
end
7: begin
// __block_20
_d__idx_fsm0 = 0;
end
0: begin 
end
default: begin 
_d__idx_fsm0 = {4{1'bx}};
`ifdef FORMAL
assume(0);
`endif
 end
endcase
// _always_post
// pipeline stage triggers
end

always @(posedge clock) begin
_q_t1 <= _d_t1;
_q___block_3_mask <= _d___block_3_mask;
_q_tx_pin <= _d_tx_pin;
_q_out_valid <= _d_out_valid;
_q__idx_fsm0 <= reset ? 0 : ( ~in_run ? 1 : _d__idx_fsm0);
end

endmodule

Any time that a block jumps to a next state (doing nothing else), the code block of the target state can be incorporated instead of just jumping, saving a clock cycle.
Doing that incorporation may leave states that are not referenced anymore in other block, so they can be safely deleted.

In practice, this UART example is taking more cycles than really needed, and applying the above may get it to the optimum amount of cycles (i.e., producing up to a bit per cycle)

Original Silice code, for reference:

algorithm uart_tx(
  input	uint8	data,
  output	uint1	tx_pin,
  output	uint1	out_valid,
  input	int32	clock_counter
)
{
  int32 t1 = uninitialized;
  t1 = clock_counter + 1; //constant may change for other baudrate
  tx_pin = 0;
  out_valid = 1;
  while(clock_counter - t1 < 0)
    { out_valid = 0; }
  t1 = t1 + 1;
  uint8 mask = uninitialized;
  mask=1;
  while(mask!=0)
  {
    tx_pin = (data & mask) != 0;
    out_valid = 1;
    while(clock_counter - t1 < 0)
      { out_valid = 0; }
    t1 = t1 + 1;
    mask=mask<<1;
  }
  tx_pin = 1;
  out_valid = 1;
  while(clock_counter - t1 < 0)
    { out_valid = 0; }
}

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

TL;DR good point, have an implementation, but this can potentially break existing code.

Details

I encountered this case in various designs and usually rewrote the loops from:

while (C) {
  A
}
B

to:

while (1) {
  if (C) {
    A
  } else {
    B
    break;
  }
}

which avoids the additional cycle.

My initial thoughts when implementing the loops were that it would be easier for the designer to predict what happens with cycles by not 'collapsing' B into the loop.

However, I now think this would indeed have been a better approach, for the following reasons.

First, while loops are already chaining each others in cases like this:

while (C1) { A1 }
while (C2) { A2 }

But not in cases like that:

while (C1) { A1 }
B
while (C2) { A2 }

Collapsing B in the loop above elegantly solves this, allowing while loops to always chain when no state is needed in between, making things more consistent and avoiding any extra cycle.

Second, in designing hardware it does turn out to be problematic, as this extra cycle is often indeed undesired (e.g. main problem appears when needing to signal the end of a loop at the exact cycle it happens). This leads to these innelegant while (1) loops.

Conversely, with the change in place one can simply go back to the previous behavior with:

while (C) { A }
++:
B

I have a prototype already in place (not pushed).

Unfortunately this change breaks some existing code. Cases are rare but might be hard to track down: For instance PAWSv2 needs added ++: like above in at least a few places, a couple of my designs too.

That makes me very hesitant (I strive to avoid breaking changes). On the plus side, there will be a major draft to master merge soon with already a few (less problematic) syntactic changes. So this may be a good time to do this change...

Let's take a few days to think about it!

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024

I think you have a dilemma between keeping backwards compatibility, and offering a way of producing faster code.
Why not both? You may pass a flag for retro compatibility (like in a C++ compiler, with -std=c++20)
If the user base is not big (maybe PAWS is the larger third party project that could be affected) I bet the developer would be willing to adapt to the new syntax and add the ++ operator where it's needed, that also offers more readability and make clearer the intent.

The structure while (C) { A } is quite common in my view, it's useful to wait for some signal to be active or condition to be met like while(C){}, useful for example in a AXI device to determine if both valid and ready signals are asserted (another example I was trying). That way, the minimal wait time can be zero and not one or more, allowing bursts at the maximum rate.

The UART example from above can also be a good test case, I expect it to be able to reach a new output value per cycle, so the maximum possible frequency is like dividing the clock by one and not two or more as happens in the current state.

I'll try the suggested structure to see if clock cycles are saved

while (1) {
  if (C) {
    A
  } else {
    B
    break;
  }
}

but the it's less readable and clean than

while (C) { A }
B

May the PAWS developer @rob-ng15 give his opinion?

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

at91rm9200 avatar at91rm9200 commented on June 11, 2024

Hello,

in many places I use the following construct:

algorithm
{
   cs = 0;  // Chip select active
   ++:      // One cycle setup time to first clock pulse
   while (data_must_be_clocked_out) {
      clockout_data;
   }
   cs = 1;  // Chip select inactive. Needs (at least) one cycle hold time after last clock pulse.
} 

If I understand correctly the changes that are being considered, these would need to be reworked.
In general, I would also be in favor of making the generated code faster and more compact.

Regards, Bernd.

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

Thanks for the feedback! Bernd, this is correct, such code would be impacted. Most often the change will have no detrimental effect and a cycle is saved.

Two bad cases that occur in my tests so far:

  • A combinational cycle appears due to the step being removed ; this one is not so bad because an error is issued and I'd add a message to indicate this change can cause it.
  • The disappearing cycle causes a change to logic as a silent side effect, the design compiles and synthesizes but is now incorrect. This is the case that is a cause for concern.

I started experimenting with a feature that tracks code potentially impacted and reports a detailed "change log warning" after compilation. This would be active between versions, and would mitigate the risk that someone would be left wondering why code is no longer working (we all know how hard debugging these things can be!). Overall, this calls for a rigorous and careful reporting of changes, which I did start to put in place and will prioritize. This, together with continuing to avoid such changes in the first place.

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024

I was thinking a bit about this and I think the example construct with and without 'break' (see below) should produce the exact same results, as users would expect, and that they should know that if any kind optimizations are enabled (for example) all constructs could remove unnecessary cycles. So the adding of extra cycles should be always explicit, as it is being reworked for PAWS. Assumed extra cycles are a bit confusing, at least for me.

while (C) {
  A
}
B

to:

while (1) {
  if (C) {
    A
  } else {
    B
    break;
  }
}

I agree with Rob that the "wait" for some signal would be a very common construct, indeed. And the most clear syntax is while(cond) { ... } , that shouldn't add any cycle if the condition is not met

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024

In my view, the code duplication should be possible, maybe depending on optimization level. There are situations where the code block will just set a flag, and the synth tool will readily optimize it using combinatorial optimizations that should take into account the FSM state (as part of the logic inputs) with probably no noticeable effect on resource usage. Indeed, cascading would be useful in certain situations, and I think the UART example will require that for max throughput. To me, after the "before" statement, the condution evaluation should be inlined in, with a separete FSM state to handle execution of the while body. In the example on previous message, the code can't be executed single-cycle when the condition is not met (as required for max speed), since there's a inserted cycle after the "before" example regardless on the condition

I mean something like that:

1: begin
  // _top
  $display("before");
  _d__idx_fsm0 = 2;
  // __while__block_1
  if (_c_C) begin
    _d__idx_fsm0 = 2; //jump to next state only if condition is met
  end else begin
    // __block_3
    $display("thing done after");
    _d__idx_fsm0 = 3;
  end
end
2: begin
  // __while__block_1
  if (_c_C) begin
    // __block_2
    // __block_4
    $display("waiting");
    // __block_5
    _d__idx_fsm0 = 2;
  end else begin
    // __block_3
    $display("thing done after");
    _d__idx_fsm0 = 3;
  end
end
3: begin
  // __block_6
  $display("a later state");
  _d__idx_fsm0 = 0;
end

Note how the body of the while statement is not copied on the 1st state

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

If code duplication is allowed, then indeed this would be the output. This is the actual output of the by-hand version I made above:

1: begin
  // _top
  $display("before");
  if (~_c_C) begin
    // __block_1
    // __block_3
    $display("thing done after");
    // __block_4
    _d__idx_fsm0 = 0;
  end else begin
    // __block_2
    // __block_5
    _d__idx_fsm0 = 3;
  end
end
3: begin
  // __while__block_6
  if (_c_C) begin
    // __block_7
    // __block_9
    $display("waiting");
    // __block_10
    _d__idx_fsm0 = 3;
  end else begin
    // __block_8
    $display("thing done after");
    // __block_11
    _d__idx_fsm0 = 0;
  end
end

On the one hand it is tempting to enter these optimizations (I actually just had to implement a mechanism to prevent natural code dup on some if/else optims 😀). On the other hand this starts to endanger a principle I set out initially: finding a balance between 1) leaving the user completely in charge (aka. being low-level) while 2) providing some syntactic conveniences which effect is predictable. There is of course a gray area -- making this fun to explore! -- but optimizations such as code dup. could quickly snowball into a full blown HLS tool, and this would change the expectations. (Also, there is the question of the optimizer being able to make good decisions based on the target hardware ... not simple).

Stay tuned, more cycles are being saved!

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024

It's is clear for anyone that too much optimization could be dangerous, but to implement for example the UART core, I'd be willing to set optimization level to the sky in order to get one output bit per cycle, instead of having to cut max baurdate in half

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

As a tradeoff, would writing it with code duplication by hand be acceptable? It should then avoid unnecessary cycles, and you stay in full control of code duplication?

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024

This is a matter of personal taste, but I'd prefer a dangerous optimization flag, than code that's hard to follow and understand
I really think the 'break' version and the 'no break' version should produce the exact same results
Don't you want different optimizations levels?
I can live with the verbose structure since I'd rather generate the code, but what I suggest is just an idea to improve your tool.
Based on what we exchanged, I'd change the current code generator, anyways, so I can have it working earlier ;-)

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

Break and no break do now produce the exact same result (see above). I was only talking about avoiding the one extra cycle at the start of a wait loop, e.g.:

  __display("before");
  while (C) { __display("waiting"); }
  __display("thing done after");

If C is false all along this currently (latest draft branch) will require one cycle. This one cycle cannot be avoided (afaict) without code duplication.

Suggestions are most welcome, keep them coming! 👍

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

Some good news this morning. Here is another case that saves cycles (many in some cases):

Some if/else constructs are such that when taking one branch what comes after the if/else is not reached.

For instance in a loop body:

  if (C) {
    break;
  }
  N

If C is true, N is never reached. Before this was introducing a cycle before N, which can actually
be skipped by pulling N in the else part:

  if (C) {
    break;
  } else {
    N
  }

This is now done automatically.
These optimizations cascade to successive if-s, for instance:

  if (C1) { break; }
  if (C2) { break; }
  if (C3) { break; }
  N

Takes now only one cycle as the if-s and N are automatically nested (versus 3 cycles before).
Interestingly the optimizer would happily duplicate code to achieve more aggressive nesting (this is the case I was mentioning), so I explicitly disallow that. But of course this could be re-enabled ... food for thoughts.

In other news, I am also seeing benefits of these optimization in designs, where the LUT count is reduced thanks to fewer FSM states. Of course fmax could suffer, but in these cases it seemed only beneficial.

I am getting ready to push to draft, so far all tests pass and look good.

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

I'm definitely keeping that in mind ; in fact I was thinking it could be specified per construct, and optionally overridden. For the case of the 'skip' loops -- which I really would like to address as they often appear indeed -- I thought there could be a if_while which performs the re-write above automatically: if_while (C) { A } B => if (C) { while (1) { if (C) { A } else { B break; } } } else { B } (B being the duplicated part).

This would make it clear the designer intends to pay the price of code duplication.

Then, when instantiating a unit, one could ask to aggressively change all while for if_while as an instantiation parameter (but I can see deep cascading cases that will basically generate all possible combinations of code paths, e.g. when B itself contains other such loops).

I'll experiment!
(edited for typos)

from silice.

suarezvictor avatar suarezvictor commented on June 11, 2024

I like the idea of a language extension and that kind of overriding. Happy of being part of this design changes!

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

Hi Rob, thanks a lot for your efforts in porting PAWSv2 to latest!! A drive through beautiful landscapes sounds like perfect to come back with a clear mind and focus!

There are two important things coming up:

  • the first introduces further cycles savings but also potential trouble (the new optimization for if/else) even though I expect this one to be much less problematic than the while loops
  • the second should help a lot and is a tool to visualize how the original source code is decomposed into states/cycles.

I am planning to test these (and hopefully push) tonight. If PAWSv2 is pushed I am happy to give it a try too!

Regarding ++: the easiest by far would be to test in simulation (e.g. verilator) if possible. A generic guideline would be that if the loops are not carefully synchronized with other parts of the design (like waiting on a busy, or BRAM cycles) then the change should not have detrimental side-effects.

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

Hi Rob, thanks for the update. I have been running late on my side (got side-tracked in optimizing the parser, good news on that front as well but different issue :) ). Hopefully it should be possible to get the same result, adding the ++: right after the loop should recover the previous behavior (if not then there is something wrong in my implementation). [edit: typo]

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

I pushed additional changes to draft, most notably the if/else cycle optimizations I mentioned above. The change log report will list everywhere this has an impact (as for while loops).

@rob-ng15 I synthesized PAWSv2 (devel) and the welcome screen shows up, however I seem to still have this issue that my SDcard is not recognized. Hopefully this is just due to my card and not due to these latest changes...

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

sylefeb avatar sylefeb commented on June 11, 2024

Hi Rob, it said "ERROR Insert SDCARD with FAT32 partition" (the screen was animated, the LEDs as well). However, great news: I tried another card and it works! (had to press reset a couple times, that's all). 👍

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

rob-ng15 avatar rob-ng15 commented on June 11, 2024

from silice.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.