Ok, what the
non-blocking statement (
<= ) does and a bit on making optimized fast compact code: What I'm describing here is a little of whats going on inside the compilers mind, and how it wires gates in the FPGA according to the example verilog code. Remember, there is no single core math unit in these FPGAs, whatever you write is literally constructed by wiring masses of gates inside the FPGA. All these wires have lengths, propagation delays, and the gates' inputs have capacitive loads which further slows everything down. This primer gives you an example of coding technique to be aware of to achieve a minimal gate count FPGA core which is fast as possible and lowest power consuming design. (If you wanted a central math processing unit, you literally need to code one into existence and wire it up into your design.)
See attached image: (if you got a color printer, print this one)
There are 3 simplified versions of code in the illustration.
In V1, the 'hsync' is generated just like in the op 'nockieboy' first 'Sync Generator' code.
You have the 10 bit register ''[9:0]'' h_count, which is truly 10 D flipflops. At the 10 data inputs of the 10 D flipflops, all parallel clocked by the (positive edge of 'clk'), the 10 Q outputs h_count is sent through a bunch of gates added to a fixed 1 bit value of '1', with that new 10 bit result fed back into the 10 D inputs of the 10 D flipflops which is the 10 bit register 'h_count'
Now, to make that 'hsync' wire, those 10 Qs of h_count need to be wired to a set of gates which subtract it from the 10 bit parameter number 'HS_STA' to check to see if the result is not negative, while more wiring of the 10 Qs of 'h_count' are fed into another mass of gates to SIMULTANEOUSLY subtract that same 10 bit 'h_count' with the 10 bit parameter value of 'HS_END' to see if the result is negative, and finally, those 2 comparisons checks are ANDed together, inverted generating a final single WIRE result labeled 'hsync'.
To understand what is happening, the compiler and fitter has to wire up the FPGA with all these gates to make to 2 comparisons as well as the +1 for the counter itself. Now, find a TTL data book with a 8bit - 8bit adder/subtract with negative flag. How many gates are there? The FPGA has to do this 2 fold, with 10 bit numbers each. (Ignoring the compiler optimizations for now.) With all that signalling and load, how clean will the 'hsync' wire be if it is driving an IO pin and any other set of logic gates in your design? Remember, things like load and delay of routing all these wires in the FPGA are taken into account even though you as the programmer aren't specifically told by the compiler other than after a design is compiled, X number of MHz is the best your design will achieve without errors.
In V2, this one is identical to V1 except that I changed 'hsync' from a wire into a register. The reason why this is an improvement is because, throughout the FPGA, all of the 'D' flipflops clock inputs have a dedicated hard-wiring making sure that all their outputs are parallel. Example, if your clock is 1 MHz, at the D input of the D flipflop register which is the 'hsync' register, when a clock comes in, the 10 bit 'h_count' with a matched clock will change to it's new value, those 10 bits will go through the mass of gates which are the 2 subtractions and & gate to determine if the number is within 'HS_STA' and 'HS_END', then that result will be presented to the D input of the 'hsync' D flipflop register, however, (and this is the SYNCHRONOUS magic), the Q of the 'hsync' register will not change until the next clock. This means that after all the number comparing, if the result takes 1ns, 5ns ,6ns,10ns, 15.2ns, 33.8ns, or any number less than 1000ns (I said 1MHz clock as an example), with that mess of gates, that D input may even be filled with glitches/noise/invalid states, as long as the correct result is ready by the next rising clock, that Q output as well as all the Qs of all the registers in your design will all snap to their new values at the same time.
So with this V2, if your 'hsync' register drives an IO pin, or if the 'hsync' is used elsewhere in your design, it will be a fast clean signal which is valid within a fraction of the rising positive edge of the 'clk' signal. The 1 negative here is that the 'hsync' result is now delayed by 1 clock, but in this case, it is easy enough to subtract 1 from the 'HS_STA' and 'HS_END' parameters. This change will allow the FPGA compiler to construct a much higher maximum operating frequency for your design, especially if the 'hsync' signal is used elsewhere in your design.
Now in 'V3', the goal here is to make the design operate as fast as possible no mater what the values of 'HS_STA' and 'HS_END'. The change I made was turn the 'hsync' register into an equivilant clocked SR-Flipflop. The code I wrote was if the 'h_count' = 'HS_STA', to turn on the 'S' in the flipflop. Looking above, at the equivilant adder/subtraction and the complex web of gates required for the math, compare all that required to perform an '=' test function which would only be 10 XOR gates whose outputs feed a single 10 input NOR gate. (This is all it takes to create the function: does 'A=B'?) That A=B function would run sub nanosecond in the FPGA compared to a 10bit number A minus 10bit number B function, originally requiring 2 of them in parallel. Just the input load of all those gates in the subtract function slows things down, never mind the delay in each gate as well as the extra length in all the routed wiring. The second half, if 'h_count' = 'HS_END', turns on the R of the SR-Flipflop clearing it to low. The load in wiring of this on the FPGA silicon is a fraction of the above code, except for 1 caveat: If 'h_count' never reaches 'HS_STA', then 'hysnc' will never turn on, or if 'hsync' is on and the 'h_count' never reaches 'HS_END', the 'hsync' will never turn off.
For legibility, the code in the graphic image in plain text:
Many things omitted for simplicity sake but at the top:
module sync_gen ( clk, hsync, hcount );
parameter HS_STA = 10 ; //Turn on hsync signal position
parameter HS_END = 210 ; //Turn off hsync signal position
input clk;
output hsync;
output hcount;
---------------------------------
V1
---------------------------------
reg [9:0] h_count;
wire hsync;
assign hsync = ~((h_count >= HS_STA) & (h_count < HS_END));
always @(posedge clk) begin
h_count <= h_count + 1; // Without a reset limit, the h_count will run 0 to 1023.
end
---------------------------------
V2
---------------------------------
reg [9:0] h_count;
reg hsync;
always @(posedge clk) begin
h_count <= h_count + 1;
hsync <= ~((h_count >= HS_STA) & (h_count < HS_END));
end
---------------------------------
V3
---------------------------------
reg [9:0] h_count;
reg hsync;
always @(posedge clk) begin
h_count <= h_count + 1;
if (h_count == HS_STA) begin
hsync <= 1;
end else if (h_count == HS_END) begin
hsync <= 0;
end
end
---------------------------------
endmodule