As far as one of my replies to your '
what's next?' question you posed some posts back, I've plucked an
FPU example from Github and have tidied it up a bit (see attached zip file).
I was thinking the best way to insert it into the GPU project was to have it sit as an instantiated module in GPU_DECA_DDR3_top, with its inputs (A, B and opcode) linked directly to GPU RAM addresses in memory, so that the host could write two 32-bit floating-point numbers to the correct memory addresses, write the desired 2-bit opcode (to specify an add, subtract, multiply, divide operation), then read the result memory location for the 32-bit result. No IO ports used, no additional ports or HDL in Bridgette, just a few memory writes and a read (or 4, to get the full 32-bit value) for the host.
Now it looks like the module is pretty quick - as far as I can tell, as long as A, B and OPCODE are valid then so is the result, which is pretty darn fast (to me, anyway). If that's the case, then I don't see the need for an 'enable' or strobe to tell the module to calculate the result, might as well just leave it running,
with its inputs and outputs constantly reading/writing to the chosen memory locations. The exactly memory addresses would be specified by parameters in the top level HDL. It's the constant reading/writing DDR3 memory that is causing me the most concern, though, hence this post.
Questions:
- Is this module any good?
- Is this a viable proposition (the module interface)?
- Is this the best way to implement this module in the project?
- If so, how should I go about linking its inputs/output to fixed memory addresses via the DDR3 memory controller?
- What's the best way to limit memory bandwidth hogging? Should I give in and use an IO port to signal to the FPU to do a calculation?