Do to the incomplete documentation provided by gowin regarding usage of the CLU as an ALU in modes 'countup', 'countdown', and 'countupdown', I've spent the last few hours running multiple test on the 'tang nano 9k' dev board, using the 'GW1NR-9c' chip. My goal was to be able to simulate the functions of the ALU in the oss-cad-suite toolchain.
I want to share my results b/c this info is not on google. and I lose stuff like this very regularly, so it will be easy to find.
All constructive criticism is welcome.
truth tables alu_simulation_wrapper
(
.I0( alu_I0 ),
.I1( settings[2] ),
.I3( settings[1] ),
.CIN( settings[0] ),
.COUT( alup_cout ),
.SUM( alup_sum )
);
output in binary truth table format
I0
I1
CIN
I3
COUT
SUM
`ALU_TYPE_ADD
TYPE 0 BC + AC + AB = |{ &{I1, CIN}, &{I0, CIN}, &{I0, I1} };
A'B'C + A'BC' + AB'C' + ABC = |{ &{!I0, !I1, CIN}, &{!I0, I1, !CIN}, &{I0, !I1, !CIN}, &{I0, I1, CIN} };
0|0|0|0 0|0
1|0|0|0 0|1
0|1|0|0 0|1
1|1|0|0 1|0
0|0|1|0 0|1
1|0|1|0 1|0
0|1|1|0 1|0
1|1|1|0 1|1
0|0|0|1 0|0
1|0|0|1 0|1
0|1|0|1 0|1
1|1|0|1 1|0
0|0|1|1 0|1
1|0|1|1 1|0
0|1|1|1 1|0
1|1|1|1 1|1
`ALU_TYPE_SUB
TYPE 1 B'C + AB' + AC = |{ &{!I1, CIN}, &{I0, !I1}, &{I0, CIN} };
A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} };
0|0|0|0 0|1
1|0|0|0 1|0
0|1|0|0 0|0
1|1|0|0 0|1
0|0|1|0 1|0
1|0|1|0 1|1
0|1|1|0 0|1
1|1|1|0 1|0
0|0|0|1 0|1
1|0|0|1 1|0
0|1|0|1 0|0
1|1|0|1 0|1
0|0|1|1 1|0
1|0|1|1 1|1
0|1|1|1 0|1
1|1|1|1 1|0
`ALU_TYPE_ADDSUB
TYPE 2 AC + B'CD' + BCD + AB'D' + ABD = |{ &{I0, CIN}, &{!I1, CIN, !I3}, &{I1, CIN, I3}, &{I0, !I1, !I3}, &{I0, I1, I3} };
A'B'C'D' + A'B'CD + A'BC'D + A'BCD' + AB'C'D + AB'CD' + ABC'D' + ABCD = |{ &{!I0, !I1, !CIN, !I3}, &{!I0, !I1, CIN, I3}, &{!I0, I1, !CIN, I3}, &{!I0, I1, CIN, !I3}, &{I0, !I1, !CIN, I3}, &{I0, !I1, CIN, !I3}, &{I0, I1, !CIN, !I3}, &{I0, I1, CIN, I3} };
0|0|0|0 0|1
1|0|0|0 1|0
0|1|0|0 0|0
1|1|0|0 0|1
0|0|1|0 1|0
1|0|1|0 1|1
0|1|1|0 0|1
1|1|1|0 1|0
0|0|0|1 0|0
1|0|0|1 0|1
0|1|0|1 0|1
1|1|0|1 1|0
0|0|1|1 0|1
1|0|1|1 1|0
0|1|1|1 1|0
1|1|1|1 1|1
`ALU_TYPE_NOTEQ
TYPE 3 C + A'B + AB' = |{ CIN, &{!I0, I1}, &{I0, !I1} };
A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{ I0, !I1, CIN}, &{I0, I1, !CIN} };
0|0|0|0 0|1
1|0|0|0 1|0
0|1|0|0 1|0
1|1|0|0 0|1
0|0|1|0 1|0
1|0|1|0 1|1
0|1|1|0 1|1
1|1|1|0 1|0
0|0|0|1 0|1
1|0|0|1 1|0
0|1|0|1 1|0
1|1|0|1 0|1
0|0|1|1 1|0
1|0|1|1 1|1
0|1|1|1 1|1
1|1|1|1 1|0
`ALU_TYPE_GREATER_EQ
TYPE 4 B'C + AB' + AC = |{ &{!I1, CIN}, &{I0, !I1}, &{I0, CIN} };
A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} };
0|0|0|0 0|1
1|0|0|0 1|0
0|1|0|0 0|0
1|1|0|0 0|1
0|0|1|0 1|0
1|0|1|0 1|1
0|1|1|0 0|1
1|1|1|0 1|0
0|0|0|1 0|1
1|0|0|1 1|0
0|1|0|1 0|0
1|1|0|1 0|1
0|0|1|1 1|0
1|0|1|1 1|1
0|1|1|1 0|1
1|1|1|1 1|0
`ALU_TYPE_LESS_EQ
TYPE 5 A'C + A'B + BC = |{ &{!I0, CIN}, &{!I0, I1}, &{I1, CIN} };
A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} };
0|0|0|0 0|1
1|0|0|0 0|0
0|1|0|0 1|0
1|1|0|0 0|1
0|0|1|0 1|0
1|0|1|0 0|1
0|1|1|0 1|1
1|1|1|0 1|0
0|0|0|1 0|1
1|0|0|1 0|0
0|1|0|1 1|0
1|1|0|1 0|1
0|0|1|1 1|0
1|0|1|1 0|1
0|1|1|1 1|1
1|1|1|1 1|0
`ALU_TYPE_COUNTUP
TYPE 6 AC = &{I0, CIN};
A'C + AC' = |{ &{!I0, CIN}, &{I0, !CIN} };
0|0|0|0 0|0
1|0|0|0 0|1
0|1|0|0 0|0
1|1|0|0 0|1
0|0|1|0 0|1
1|0|1|0 1|0
0|1|1|0 0|1
1|1|1|0 1|0
0|0|0|1 0|0
1|0|0|1 0|1
0|1|0|1 0|0
1|1|0|1 0|1
0|0|1|1 0|1
1|0|1|1 1|0
0|1|1|1 0|1
1|1|1|1 1|0
`ALU_TYPE_COUNTDOWN
TYPE 7 C + A = |{ CIN, I0 };
A'C' + AC = |{ &{!I0, !CIN}, &{I0, CIN} };
0|0|0|0 0|1
1|0|0|0 1|0
0|1|0|0 0|1
1|1|0|0 1|0
0|0|1|0 1|0
1|0|1|0 1|1
0|1|1|0 1|0
1|1|1|0 1|1
0|0|0|1 0|1
1|0|0|1 1|0
0|1|0|1 0|1
1|1|0|1 1|0
0|0|1|1 1|0
1|0|1|1 1|1
0|1|1|1 1|0
1|1|1|1 1|1
`ALU_TYPE_COUNTUPDOWN
TYPE 8 CD' + AD' + AC = |{ &{CIN, !I3}, &{I0, !I3}, &{I0, CIN} };
A'C'D' + A'CD + AC'D + ACD' = |{ &{!I0, !CIN, !I3}, &{!I0, CIN, I3}, &{I0, !CIN, I3}, &{I0, CIN, !I3} };
0|0|0|0 0|1
1|0|0|0 1|0
0|1|0|0 0|1
1|1|0|0 1|0
0|0|1|0 1|0
1|0|1|0 1|1
0|1|1|0 1|0
1|1|1|0 1|1
0|0|0|1 0|0
1|0|0|1 0|1
0|1|0|1 0|0
1|1|0|1 0|1
0|0|1|1 0|1
1|0|1|1 1|0
0|1|1|1 0|1
1|1|1|1 1|0
`ALU_TYPE_MULTIPLIER
TYPE 9 ABC = &{ I0, I1, CIN };
A'C + B'C + ABC' = |{ &{!I0, CIN}, &{!I1, CIN}, &{I0, I1, !CIN} };
0|0|0|0 0|0
1|0|0|0 0|0
0|1|0|0 0|0
1|1|0|0 0|1
0|0|1|0 0|1
1|0|1|0 0|1
0|1|1|0 0|1
1|1|1|0 1|0
0|0|0|1 0|0
1|0|0|1 0|0
0|1|0|1 0|0
1|1|0|1 0|1
0|0|1|1 0|1
1|0|1|1 0|1
0|1|1|1 0|1
1|1|1|1 1|0
So after about 2 tries, i cheated and used an online kmap calculator to pinpoint my errors.
http://www.32x8.com/The effect of ALU_CHAIN.CIN value is as follows
+ `ALU_TYPE_ADD .CIN should be set to 'LOW'. otherwise it will count up by 1 .COUT indicates overflow when HIGH
- `ALU_TYPE_SUB .CIN should be set to 'HIGH'. otherwise it will count down by 1. .COUT indicates underflow when LOW
+- `ALU_TYPE_ADDSUB The effect is based on I3. I3('b0) for subtraction, I3('b1) for addition. (see .CIN & .COUT above)
!= `ALU_TYPE_NOTEQ .CIN should be set to 'LOW'. otherwise .COUT will return HIGH .COUT (LOW I0 == I1 ) (HIGH I0 != I1)
>= `ALU_TYPE_GREATER_EQ .CIN should be set to 'HIGH'. otherwise function is I0 > I1 .COUT (LOW I0 < I1 ) (HIGH I0 >= I1)
<= `ALU_TYPE_LESS_EQ .CIN should be set to 'HIGH'. otherwise function is I0 < I1 .COUT (LOW I0 > I1 ) (HIGH I0 <= I1)
++ `ALU_TYPE_COUNTUP .CIN should be set to 'HIGH'. otherwise the function is paused .COUT indicates overflow when HIGH
-- `ALU_TYPE_COUNTDOWN .CIN should be set to 'LOW'. otherwise the function is paused .COUT indicates underflow when LOW
+-`ALU_TYPE_COUNTUPDOWN The effect is based on I3. I3('b0) for subtraction, I3('b1) for addition. (see .CIN & .COUT above)
* `ALU_TYPE_MULTIPLIER its complicated
attached is
AUL_truth_tables.v - used to generate the truth tables
ALU.v - used to simulate the ALU primitives for the tang nano 9k.
(rant)
This has been a challenge.
Has to be an easier way to extract this information.
The first 3 chapters of my text book.
I have had great difficulty in getting Gowin's toolchain to synthesize simple counters, FIFO pointers, really any arithmetic at 100MHz.
Sometimes it will infer an alu, most the time it will not. This is a combination of my inexperience with verilog and IDE settings.
for example
always @( posedge clk ) begin
if( enable ) begin // when running
counter <= counter + 'd1;
if( counter >= reset_value ) begin
strobe_ff <= 1'b1;
end
end
if( !rst_n || strobe_ff ) begin // reset the counter on reset
counter <= 'd1;
strobe_ff <= 1'b0;
end
end
Fmax values are taken from the timing report, not the inaccurate synthesis report.
counter is defined at being
reg unsigned [15:0] counter = 0;
The critical path here is 'counter <= counter + 'd1;'. because counter is set elsewhere(line #9), there is a mux in front of it, and I get a Fmax around 80~95MHz.
By swapping that line out for a declared alu, Fmax reaches between 150~175MHz.
when I remove the reset and allowing a free running counter, (removing the MUX) FMax increases,
using 'CIN' as an enable port removed another level of logic.
And adding additional registers, to calculate and store the reset value. I get an FMax at 197.8Mhz.
I'm so close to 200 I can almost touch it.
By studying the 'Floor planner' or placement constraints editor, the ALU_CHAIN is broken up across multiple slices. But these slices are not adjacent to one another. Im hoping a few constraints will fix the issue and get me past the 200 mark. (future me here, so the synthesis and PNR tools have settings, that are by default, set for area, not speed. changing these now infers an alu for everything, tightly packed and fast. Still not 200mhz yet.).SoC SDRAM runs at 208mhz, thats my goal. Their protected IP HSFIFO builds at speeds approaching 250mhz, So I know it is doable.