now, all the joking aside.
FPGA's have their place in medium volume applications where the design of full custom is too expensive , and the standard stuff available is simply too cumbersome as it would make the system physically large.
Another field is the area where not all criteria are known yet but design needs to start. Simply slap in an fpga and as the design evolves , adapt it. We'll deal with it later...
You will find fpga's in oscilloscopes from one-hung-lo ( because they don't have the money to go full-custom) to Agilent ( if there is an afterthought they can simply release a new config pattern and expand capabilities and lifetime of a product.
The idea of having soft-cores is indeed dwindling down. It is counterproductive and wastes gigantic amounts of cells in the fpga. The best solution is to take a standard microcontroller or microprocessor and pair it with an FPGA. Some fpga vendors are now offering fpga's with hard-cores ( cortex ) in em. That is the real future.
I have been using microcontroller/fpga combos for a long time. Instead of screwing around trying to get the cpu to do some very difficult , because timing related, task and jump through hoops i simply offload that to the fpga.
Doing this allows for fast and simple code on the cpu and fast and simple code in the fpga. A cpu has its strenghts and an fpga has its strengths. Combining the two is a win-win situation.
Now, on the development front : while fpgas are internally incredibly complex in terms of interconnect and partitioning , you are almost completely shielded from that by the development tools. Even for medium to large size designs you will never have to dig into manual constrianing , timing closure and other 'nasties'. I have designs running in FPGA , clocked at 200MHz. I didn't do a single timing analysis in the entire design. I didn't even simulate it. The fpga is fast enough to cope with this. my design is partitioned so that control signals govern the passing of information between blocks. The control signals are written in such a way that they are active at least one clocktick later than the data is steady. the only tricky bit where ismulation was needed was an arbitration system and the domain crossing. those things are difficult. but if you design fully syncronous logic in 99% of the cases it will work without having to fidget with the complex bits.
If you take a cheap fpga and i give you the task to do the following :
Make me a LED display driver for those 14 segment displays( starburst ) that can drive a 16 character display , multiplexed, with pwm dimming , character decoder and a spi interface this may sound overwhelming.
in reality the implementation is very simple and you can design block by block , try the block in a simulator , and then move to the next block.
spi interface :
lets assume we do a 16 bit transfer : first 8 bits hold '0000' followed by a 4 bit character address , next 8 bit hold the character code
0000_1001_0110_0101 as spi transfer would mean :
------ 9 6 5 : character 9 is 0x65
here we go :
module spi(input MOSI,CLK,CE, output char[7:0], input address[3:0])
reg [7:0] char0,char1,char2,char3,char4,char5,char6,char7,char8,char9 ... // we need 17 here
always @(posedge CLK)
if (!CE) begin
shifter[15:0] <= shifter{[14:0,MOSI}; // if CE is low : shift MOSI in to a shiftregister controlled by the CLK
end
else begin
if case (shifter [11:8])
4'b0000 : char0 <= shifter[7:0];
4'b0001 : char1 <=shifter[7:0];
... do this for the 16 possible combinations
end
// look ma : i just made an SPI slave device that can take in a 16 bit command word, decode it and store the data in 16 registers...
// now how do we get the data out to process it internally ?
// well
always_comb begin
case address[3:0]
4'b0000 : char = char0;
4'b0001 : char = char1;
.. contine the decoder her for all 16 locations
end case
end
endmodule
there you go. a block in the fpga can now randomly access the stored characterloaded through SPI...
oh. wait dimming .. we had 4 empty bits right ? let use the first bit to switch between 'loading text' and setting options..
if i was breadboarding this with chips this would be an 'oh fuck' moment... but since this is an fpga... piece of cake:
module spi(input MOSI,CLK,CE, output char[7:0], input address[3:0], output reg brightness[7:0]) // add an 8 bit output for brightness
reg [7:0] char0,char1,char2,char3,char4,char5,char6,char7,char8,char9 ... char15;
always @(posedge CLK)
if (!CE) begin
shifter[15:0] <= shifter{[14:0,MOSI}; // if CE is low : shift MOSI in to a shiftregister controlled by the CLK
end
else begin
if shifter[15] brightness <=shifter[7:0]; // if the bit is set : it's brightness
else // if not : its characters.
case (shifter [11:8])
4'b0000 : char0 <= shifter[7:0];
4'b0001 : char1 <=shifter[7:0];
...
4'b1111 : char1 <=shifter[7:0];
end case
end
// look ma : i just made an SPI slave device that can take in a 16 bit command word, decode it and store the data in 16 registers...
// now how do we get the data out to process it internally ?
// well
always_comb begin
case address[3:0]
4'b0000 : char = char0;
4'b0001 : char = char1;
.. contine the decoder her for all 16 locations
end case
end
endmodule
now, purists are going to say : you could have made an array of chars , you could have done away with the if shifter[15] and simply made a biger case statement testing for the bit there like so:
case shifter[15:8]
8'b1000_0000 : brightness < ...
8'b0000_0000 : char0 < ...
...
8'b0000_1111 : char15 <...
in the end it all doesnt matter. yes you could write it more compact, cleaner but the end result will be EXACTLY the same. this is different form writing software. in software: the more lines of code you plonk down the more instructions you end up with and the more clockcyles the cpu will take to execute it.
in an FPGA this is NOT the case. Your brainfart gets translated in logic equations. These get expanded first , minimized and mapped in a lookup table. The end result is : 1 clocktick to do all that crap. Irrespective of how you wrote it, longwinded, elegantly, purist format, doesn't matter. The language statements all create boolean logic. if the source ended up with a very long equation , or a short one doesn't matter , after logic minimizing both will yield the same solution, simply because there is only 1 solution in the logic domain for a given problem.
so now we have access to the stored characters... how do we scan them..
well, a counter selecting one of them and sending it to the decoder and selecting the common line of one of the display.
module scanner( output reg address[3:0], output columns[15:0])
[code]
always @(posedge clk) begin
address <=address +1; // no need for other code. when we hit 1111 it will roll to 0000. basic nature of flipflop logic
end
always_comb begin // make me some combinatorial crap here
columns[15:0] == 16'd1 << address; // make only 1 bit high , bit is determined by value of 'adress'
end
if i now to the address lines of this module to the address lines of my spi block , the spi block will spit out the recieved characters in sequential order. at the same time my outputs select one of the column drivers in the display.
all i need is a character decoder
module starburst (input char[7:0], output bitmap[13:0]
begin
always_comb case char[7:0]
8'd65 : bitmap = 11_0000_1000_1010 // bitmap for letter 'A"
8'd66 : bitmap = 11_1001_1001_1001 // bitmap for letter 'B'
.. and so on
default : bitmap = 00_0000_0000_0000 // anything we didnt define : display nothing
end case
endmodule
connect the char bus together and you are done. this thing will scan the characters, decode the character into the appropriate LED bitmap , select the reight column and drive the leds'.
oh.. dimming... well i leave that up to you : simply throw in a counter that gets loaded with the dimming value and counts down. if the counter is larger than dimming value : pas the bitmap , if it is lower : force the bitmap to all 0. this is imply a small block between the output of the starburst module and the real led pins.
along the lines of
pwm = pwm+1
if (pwm > brightness) bitmap_out = 0 else bitmap_out = bitmap...
so in a few lines , each in a very simple step i have created a fairly complex bit of logic. it reads an SPI datastream , decodes instructions and data, stores it , scans the stored information , decodes that into character bitmaps , pwm's the output and drives a 16 character 14 pixel led display...
that is a VERY complex thing if you were to try doing that with loose ttl or cmos chips.... the schematic alone is a nightmare. working out all the equations is a nightmare , soldering it is a nightmare.
plonk down an FPGA, think a bit what you need , partition it in simple chunks, write some simple code and leave the rest to the synthesizer. it will work out all the details.
do i need to do timing analysis for the bove ? hell no. That spi clock is 16Mhz. The FPGA laughs at that ...
the character decoder may be large and have some trouble , but, since the display is scanned at a much slower pace you wont even see it there. you could simply throw in another latch ar inject a deadtime between character transients to mask that off.
it ain't something to be scared of