Author Topic: FPGA VGA Controller for 8-bit computer (Read 493056 times)

james_s · « **Reply #25 on:** October 18, 2019, 01:11:58 am »

I use VGA (or composite) for this retro stuff because retro machines just look weird on LCD displays generally and I have a lot more monitors with VGA and only a couple things with HDMI. It's also a lot easier to wire VGA.

I wonder what happened to Grant Searle's website? I have a lot of the code stashed away but there was a ton of cool stuff on there.

Berni · « **Reply #26 on:** October 18, 2019, 06:33:04 am »

These day its still common to share main system RAM between the CPU and GPU, but its done differently.

Today caches are used on all sorts of CPUs even small ones that run under 100MHz. This means the CPU does not always need to be poking and prodding at RAM in order to execute every single instruction. For code that loops around a lot (this is usually where speed matters the most) all of the instructions end up inside the cache and so the CPU completely stops reading code from RAM while for any temporary calculations there are usually plenty of registers to keep things in. So once things are cached the access to RAM is just to move the final computation results in and out of RAM and even that happens in large blocks of cache writebacks (Dynamic RAM usually loves large burst accesses due to pipelining). All this was required to push CPUs into the 100s of Mhz since main system RAM could not keep up.

So this means you can actually take the RAM away from the CPU for many cycles at a time and it won't even notice. Also modern DRAM can easily be run at 100MHz or more and the wide burst access nature of caches means you can benefit from having a really wide RAM bus such as 64 bit, 128 bit... etc to get even more speed out of it even tho the CPU might only be 16 or 32bit (In graphics cards 512 bit wide RAM buses are not uncommon because they really need lots of bandwidth).

All of this means plenty of RAM bandwidth can be leftover from the CPU and can be used for the GPU. The bus arbitrator can split the bandwidth however it wants between them (Not the usual fixed 50/50 split that interleaved access did in the old days). But it also means that the video output hardware has to have some cache of its own as it will not be guaranteed RAM access at a moments notice. The CPU might be in the middle of a cache writeback and is taking up the bus for the next say 16 cycles or a DRAM refresh cycle might be happening, so the video hardware must have enough pixels in its own internal buffer to keep outputting the image while it waits to finally get to the RAM and catch up.

If you are after a ton of graphics horsepower you could just give the GPU its own set of RAM chips with a huge wide bus to make sure it really has a firehose worth of guaranteed memory bandwidth and then implement a bus MUX that lets the CPU use the memory when the GPU is idle and video output is in blanking, or have the GPU listen to commands from the CPU of what to put into graphics memory. But for a retro computer project that is way overkill since on a decent FPGA this would have the graphics horsepower surpassing early 3D graphics accelerator cards in PC such as the Voodoo

nockieboy · « **Reply #27 on:** October 18, 2019, 08:33:46 am »

Quote from: jhpadjustable on October 17, 2019, 11:49:55 pm

Personally I'd have put a Z80 soft core and an SDRAM controller on the FPGA and called it 80% done

Quote from: Berni on October 18, 2019, 06:33:04 am

But for a retro computer project that is way overkill since on a decent FPGA this would have the graphics horsepower surpassing early 3D graphics accelerator cards in PC such as the Voodoo

I guess my motivations may be hard to understand for some as I could achieve all this by just putting it ALL onto the FPGA, but my original intention was to learn how to build a computer from scratch (having been given the desire to do so by stumbling across Grant Searle's page) and learn some electronics skills on the way. I'm now working with a system I want other people to be able to build who want to go along the same journey I did - although it involves a lot of SMT parts now as my soldering skills have developed and an FPGA graphics card certainly IS overkill, it just seems too good an opportunity for me to miss to be able to just plug my little home-made computer into my living room TV and play Pong on it with the family.

Quote from: SiliconWizard on October 17, 2019, 04:24:43 pm

You could also design your MMU so that CPU and video accesses are interleaved. I dunno how fast your CPU is going to run, but if it's a few MHz (as that was back in the day) that should pose no issues.

The CPU runs at 8 MHz (or 4 MHz, if you want to experience true 80's CP/M!). So you think there would be no noticeable slow-down if I were to shut down memory access for ~73% of the time while the FPGA is accessing the frame buffer to draw the visible area of the screen? To me, knowing little about this sort of stuff, that equates to slowing the computer down by almost three quarters?

Part of the reason for this post was to discuss the best way to interface the FPGA with the computer - frame buffer in computer RAM, or gated via the FPGA, for example. Both have positives and negatives, I just don't have the knowledge or experience to understand their weight and balance the options properly, if that makes sense?

Quote from: SiliconWizard on October 17, 2019, 04:24:43 pm

Any Spartan 6 certainly has more than 18KBytes of embedded RAM. The LX4 (smallest) has 216Kbits (27KBytes), and the LX45, which is still reasonable price-wise, has 2088Kbits (261KBytes!)
You confused the "18Kb" (Kbits) figure for the Spartan 6, which is the individual size of the RAM blocks, certainly not the total available!

Ah, my mistake. I was a little surprised it was so low actually, considering I was comparing it against an Altera Cyclone II (which I have currently) which is much older and only has a little less.

The LX45 is VERY promising - a little expensive for what I'm trying to do, but would make life VERY easy as I could have an internal dual-port frame buffer that could give me resolutions up to 640x480 with 4 bit colour depth using a LUT, and still have over 100 KB free for the LUT, buffers, character set(s) and even sprites... And the best thing? It's available in TQFP, which I consider still reasonably easy to solder (have never tried BGA... other than using a frying pan, don't see how it could be done at home). Hmmmm.... thanks SiliconWizard, that's a great suggestion.

Quote from: james_s on October 18, 2019, 01:11:58 am

I use VGA (or composite) for this retro stuff because retro machines just look weird on LCD displays generally and I have a lot more monitors with VGA and only a couple things with HDMI. It's also a lot easier to wire VGA.

I wonder what happened to Grant Searle's website? I have a lot of the code stashed away but there was a ton of cool stuff on there.

True, VGA was my original intention when I started this post, but having had HDMI suggested as not being impossible, I think it'd be far more future-proof to use that instead. I'm concerned VGA is diminishing in popularity these days - it will only get harder to get hold of TVs / monitors with VGA inputs, and whilst I've built a 'retro' computer, I'm trying to keep in mind future availability of parts (partly hence the move to SMT components as some are like gold dust in DIP form).

Grant's site seems to have been deleted, according to the site message - so it doesn't look like the domain has expired, it looks more like an intentional removal of the site.

EDIT:

I made a mistake - the LX4 is NOT available in TQFP, only BGA or wafer, which means I can't use it unfortunately (my SMT-soldering-fu isn't that strong!) At least, not unless I find a PCB manufacturer who is willing to solder it to the PCB for me as part of the fabrication process. I tend to use JLCPCB, so I've contacted them to find out if they will do it as part of their new SMT-assembly service.

Berni · « **Reply #28 on:** October 18, 2019, 09:52:40 am »

With a FPGA you can go as far as you want to go. You can just implement a RAM buffer that gets drawn to the screen if that's what you are after. But you can then add on things like hardware graphics acceleration if you want to have the graphics seen on the 16bit game consoles back in the day. Sound can also be implemented in the same FPGA giving you emulation of the popular chiptune or FM synthesizer chips.

But if you just want to output video from a RAM framebuffer then you can get something like the Solomon Systech SSD1963 in a LQFP128 package:
https://www.buydisplay.com/download/ic/SSD1963.pdf

While it is called a "LCD Display Controller" its actually outputting a RGB video bus that can be turned into VGA via a DAC or into HDMI via a RGB to HDMI converter chip. The input to it is just a memory bus so you connect it to your Z80 in the same way you would connect SRAM or any other peripheral chip (Tho it is 3.3V so you might need level shifting). The framebuffer is internal inside the chip and it can run at up to 110MHz so any retro CPU will not be too fast for it. Its essentially a video card in a chip.

But the downside of using such a ready made solution is that it can't emulate anything else. So if you have existing software that expects to talk to certain video hardware then it wont work here, but replacing this chip with a FPGA lets you customize its memory mapping and registers in a way that mimics some existing video chip in some retro computer and allows the existing software to run on it.

In a way a FPGA is kinda cheating if you are after the genuine retro computer building experience. Having to breadboard 1 to 10 chips and running 10 to 100s of wires just to add some peripheral to your computers memory bus. But with a FPGA you just edit a few lines of code on your PC and load the new firmware into the FPGA and boom suddenly your computer has 3 extra GPIO ports and a PWM output. Few more lines of code and suddenly you also have a floating point coprocessor in your computer, some more core and you have a separate programmable DSP core in there too. Need more grunt? Okay copy paste 3 more of those DSP cores to now have a quad core DSP on the bus. Doing the same in the 80s would requires a few suitcases worth of boards hanging off your computer, these days its just adding some more code into a FPGA chip sitting on the bus.

Canis Dirus Leidy · « **Reply #29 on:** October 18, 2019, 10:07:06 am »

Quote from: dferyance on October 17, 2019, 01:40:32 pm

Yes, that is exactly right. The palette is a LUT. A 256 byte LUT can pretty easily fit in a FPGA so the lookup can be quite fast. One common technique used for old DOS VGA games is to modify the palette to do simple animations. This is called palette shifting. So while the palette can be limiting as you can only have 256 colors at a time it can also be handy as well.

And also a programmable palette (in combination with bit planes video RAM ) made life somewhat easier for a programmer, when your target machine don't have hardware sprites:

Quote from: SiliconWizard on October 17, 2019, 02:56:25 pm

The CPU could for instance access it (to write to it or read from it) via some kind of memory mapping scheme, like maybe in chunks ("windows") of 8KB, and a bank selection mechanism. More efficient than using I/O accesses IIRC.

*cough* MSX *cough* This standard explicitly prescribed that all access to video memory should be made only through the ports of the video controller.

Quote from: nockieboy on October 18, 2019, 08:33:46 am

Part of the reason for this post was to discuss the best way to interface the FPGA with the computer - frame buffer in computer RAM, or gated via the FPGA, for example. Both have positives and negatives, I just don't have the knowledge or experience to understand their weight and balance the options properly, if that makes sense?

Well, if you look at PC video chips from EGA/VGA/Early SVGA era (like GD542x family or ET4000), you will see that video memory was actually isolated from CPU bus. All that the processor could see was an intermediate buffer and translation logic, creating the illusion of direct access to video memory.

P.S. And talking about resistor-based DACs. Little Chinese trick:

Instead of R-2R ladder he (or she, or whatever) used SMD resistor arrays for making binary weighted resistors values.

legacy · « **Reply #30 on:** October 18, 2019, 10:19:44 am »

Quote from: james_s on October 18, 2019, 01:11:58 am

I use VGA (or composite) for this retro stuff because retro machines just look weird on LCD displays generally and I have a lot more monitors with VGA and only a couple things with HDMI. It's also a lot easier to wire VGA.

My Sonoko project uses three VGA LCD-screens, a VGA service card, and a VGA KVM. NEC VGA LCD-screens are 17inc and were bought new. Personally, I do not like HDMI because it consumes too much bandwidth on wires and on the circuit, and products are even more expensive. The HDMI KVM version costs double, cabling is much more expensive. This is a premium if you are on super modern technology, and e.g. the Advoli TA6 is a PCIe GPU card with nothing but HDMI via LAN outputs, but ... this stuff is a no-way unless you are in business with a squad of engineers.

SiliconWizard · « **Reply #31 on:** October 18, 2019, 02:49:33 pm »

Quote from: nockieboy on October 18, 2019, 08:33:46 am

Quote from: SiliconWizard on October 17, 2019, 04:24:43 pm
You could also design your MMU so that CPU and video accesses are interleaved. I dunno how fast your CPU is going to run, but if it's a few MHz (as that was back in the day) that should pose no issues.

The CPU runs at 8 MHz (or 4 MHz, if you want to experience true 80's CP/M!). So you think there would be no noticeable slow-down if I were to shut down memory access for ~73% of the time while the FPGA is accessing the frame buffer to draw the visible area of the screen? To me, knowing little about this sort of stuff, that equates to slowing the computer down by almost three quarters?

Well, nope. The interleaving suggestion implied that the RAM would actually be accessed faster, so the CPU wouldn't see a difference. To pull that off, you would of course need a fast enough RAM (which should be no problem with a modern SDRAM chip or even SRAM), and clocking the MMU faster. Also, to keep things simple, you'd have to have the CPU and video clock frequencies multiple.

james_s · « **Reply #32 on:** October 18, 2019, 04:40:14 pm »

If you want to play with (crude) graphics hardware, have a look at some of the early arcade games I've done.

https://github.com/james10952001

I've made them quite modular so you can easily mix & match pieces. The old Atari B&W games are cool because the video hardware is entirely independent of the CPU which just writes into RAM and then the hardware continuously reads these RAM locations and uses them to address object ROMs that result in objects on the screen. Originally this used standard SRAM with data selectors but in some of these I just made it dual ported RAM which makes it super easy to "wire" it up to whatever CPU you want. The object ROMs are easily customized as well to display whatever characters you want, and the schematics for the original hardware are readily available. Hack away.

I should add that these are all targeted to the same $12 FPGA board that Grant used, and in fact if you poke around there's a PCB in that repository for a daughter board that plugs right into that FPGA and has sockets for RAM, keyboard, micro SD, etc.

nockieboy · « **Reply #33 on:** October 18, 2019, 07:02:29 pm »

Quote from: Berni on October 18, 2019, 09:52:40 am

With a FPGA you can go as far as you want to go. You can just implement a RAM buffer that gets drawn to the screen if that's what you are after. But you can then add on things like hardware graphics acceleration if you want to have the graphics seen on the 16bit game consoles back in the day. Sound can also be implemented in the same FPGA giving you emulation of the popular chiptune or FM synthesizer chips.

Hmm.. if I can get an AY-3-8910 implementation up and running on the same FPGA as the graphics, that'd be great and save me a lot of kerfuffle with the sound card using a genuine chip, which I'm having timing problems with at 8 MHz at the moment.

On that note, I'd like to include a keyboard handler as well - the thought crossed my mind that I might actually be able to include a USB host (or USB-to-go, at least), so that I could plug a USB keyboard in without having to depend on dwindling supplies of PS2 keyboards. A quick search earlier showed me that an IP is available for USB-to-go for the Spartan, but looks like it could be costly to buy/licence.

Quote from: Berni on October 18, 2019, 09:52:40 am

...But the downside of using such a ready made solution is that it can't emulate anything else. So if you have existing software that expects to talk to certain video hardware then it wont work here, but replacing this chip with a FPGA lets you customize its memory mapping and registers in a way that mimics some existing video chip in some retro computer and allows the existing software to run on it.

You're right, using an FPGA is cheating - but I might take a look at the video chip you mentioned as an alternative. My system is highly modular, so I could develop an FPGA-based video/sound/keyboard card AND one based on separate and more authentic discrete chips for those features and leave the choice of which they'd prefer to the person building the system.
[/quote]

Quote from: Canis Dirus Leidy on October 18, 2019, 10:07:06 am

And also a programmable palette (in combination with bit planes video RAM ) made life somewhat easier for a programmer, when your target machine don't have hardware sprites:
(Attachment Link) (Attachment Link) (Attachment Link)

Спасибо - I wish my Russian was better and I could read the writing, but I think I get the idea.

Quote from: Canis Dirus Leidy on October 18, 2019, 10:07:06 am

*cough* MSX *cough* This standard explicitly prescribed that all access to video memory should be made only through the ports of the video controller.

Yes, a friend of mine has been recommending I look at the MSX architecture for tips, I'm starting to see why now. So passing data via IO calls to the FPGA is a viable proposition then. I can see this method being quite useful for things like clearing the screen, or drawing rectangles and other basic shapes using only a handful of IO instructions. I'm going to have to give some serious thought to what sort of instructions I'll want to implement in the video controller.

A gold-standard outcome for me would be to create a video controller that enables my computer to produce graphics for half-decent games. Initially I see things like Pong and Breakout being easy enough, but I'd like to simulate the kind of 80's games that were commonplace on the 8-bit systems of the day. As the 80's computers I'm familiar with all used interleaved memory frame buffers, so I'm assuming these sorts of games would still be possible using the MSX method, as the MSX managed it?

Quote from: Canis Dirus Leidy on October 18, 2019, 10:07:06 am

Well, if you look at PC video chips from EGA/VGA/Early SVGA era (like GD542x family or ET4000), you will see that video memory was actually isolated from CPU bus. All that the processor could see was an intermediate buffer and translation logic, creating the illusion of direct access to video memory.

Yes, I figured having a buffer would be helpful as the FPGA may not be able to access the frame buffer immediately on receipt of a command without causing tearing. I'm still not 100% convinced I fully understand how it'll work with the Z80 sending commands/data to the FPGA via an IO port, but I'm sure I'll get my head around it at some point.

Quote from: legacy on October 18, 2019, 12:44:12 pm

CP/M was designed to use VT100 and VT220 terminal, hence the FPGA needs to implement a simple VDU, or the SOC needs to implement a simple serial link with a decent FIFO to use an external VT100.

Yes, I'm intending to emulate the VT100 terminal and handle ANSI-escape codes in the video controller. Grant's Multicomp VHDL does a good job of that already.

Quote from: SiliconWizard on October 18, 2019, 02:49:33 pm

Well, nope. The interleaving suggestion implied that the RAM would actually be accessed faster, so the CPU wouldn't see a difference. To pull that off, you would of course need a fast enough RAM (which should be no problem with a modern SDRAM chip or even SRAM), and clocking the MMU faster. Also, to keep things simple, you'd have to have the CPU and video clock frequencies multiple.

You mean the FPGA would read the 'frame buffer' in the Z80's logical memory space into an internal buffer really quickly, then pass that out as a pixel stream? Otherwise surely the frame buffer would be sending data at the rate the screen mode requires, which means 73% of the time it would be sending data and locking the Z80 out? Not sure I understand this fully.

Quote from: james_s on October 18, 2019, 04:40:14 pm

If you want to play with (crude) graphics hardware, have a look at some of the early arcade games I've done.
...Hack away.

Ah thanks for that james_s - will take a good look over the weekend.

jhpadjustable · « **Reply #34 on:** October 18, 2019, 07:39:32 pm »

Quote from: nockieboy on October 18, 2019, 08:33:46 am

I guess my motivations may be hard to understand for some

I understand it and respect it, even if I don't personally feel the attraction (anymore).

To paraphrase an old proverb, maybe Daoist, "The man who knows no stories is a fool. The man who knows many stories is wise. The man who knows one story is dangerous." The Old Ones have bequeathed many useful stories to us through their artifacts. I urge you to look at what other designers were doing in the 1980s when tasked to write a display engine. Even a close reading through the programmer's manual of some of these older systems will give you some flavor of their concerns and the clever tricks they used to get (then) high-performance video out of (today) low-performance hardware while holding to an accessible price point.

But you have to know your medium before you can create a useful artifact. Digital design is not a qualitative discipline. There is no substitute for looking at timing diagrams and characteristics tables, establishing cause and effect, looking up propagation delays, and doing the sums. For example,

Quote

no noticeable slow-down if I were to shut down memory access for ~73% of the time while the FPGA is accessing the frame buffer to draw the visible area of the screen

isn't a very useful question. The better question is, how much slowdown would there be if...? To reckon that, we have to look at the Timing section of the Z80 CPU User Manual, UM0080. For example, if you look at the instruction fetch cycle you'll see that the last 2T of the M1 cycle is spent "refreshing", where the Z80 doesn't care what's on the bus but still drives some of the address pins as a convenience to users of then-new DRAM, and that the Z80's proper business is done at the end of T2. We also know that our memory system can service instruction reads in 2T, and that, if we are using SRAM or a DRAM controller that handles its own refresh, the Z80's refresh cycle is wasted time. What if we simply disconnect those pins from the data bus at T3, hold the read value in a latch for the convenience of the Z80, and let other hardware access the bus instead? You just found a £10 note in the sofa cushions, depending on what kind of deal your company was able to score on DRAM.

Back to cases. Having written out some ins and outs, you might then look at I/O read and write machine cycles, and see that they are 4T in length. For I/O writes, we see that all address, control, and data of interest are available in T2. If all our devices are fast enough to have completed the write by the end of T2, we can just disconnect the Z80 from the bus and let the system bus free for other business during TW* (third beat) and T3 (fourth beat). For I/O reads, we see that all the address and control are once again available in T2, but the CPU doesn't read the data until the second half of T3. So, again assuming that our devices are fast enough, we will have our read data by the end of T2 and need only hold it for the CPU through TW and T3. We can then unhook the Z80 from the bus at the end of T2 and go about our other business. Cool, another £10 in the cushions!

Now we look at loads and stores. We see that memory write cycles are 3T in length, and that our snappy little jig has become instruction-dependent math rock. But

! The Z80, like most other processors of the time, allows bus cycles to be stretched to accommodate slow hardware. We can insert one TW to lengthen the cycle to 4T and keep the rhythm, taking note of the penalty in order to answer our original question. Having done that, we look again at the memory write cycle and see that data and address are valid by the end of T1, but the !WR signal isn't valid until the end of T2. We can assume by the end of T1 that, if !MREQ is asserted and !RD is not, !WR will be asserted by the end of T2, and prepare accordingly. The rest of the write cycle follows that of I/O out cycles with the exception of the wait state we inserted, and we can likewise disconnect the CPU from the bus for TW and T3 without the Z80 any the wiser. £10! As for loads, we see that they too are 3T in length, so to keep the rhythm let's add the wait state and mark the penalty. Looking at our extended read cycle, we once again see that data isn't sampled until the end of T3 (fourth beat, as TW was inserted as the third beat). If our memories are fast enough, we can sample the data at the end of T2 just as we did for the I/O and hold it for the CPU, while we unhook the CPU from the bus and use the last 2T for our own business. £10!

Finally let's look at the interrupt request/acknowledge cycle, which is at minimum 5T long, and now we've gone to playing experimental jazz. But we will know what kind of cycle it is by the end of T1, and know it is an interrupt if we see !M1 asserted and neither !MREQ nor !IORQ asserted. In our system design we have a decision to make: do we want to pass the vector cycle through to the system bus, or handle it off the system bus? If you pass it through, you can treat it much the same as any other memory read cycle but generate the !WAIT signal and hold the received vector for 5T longer than otherwise, which we will mark into the penalty column. If you prefer to handle it off the system bus, you do the same with the !WAIT signal but disconnect the CPU from the bus and supply the vector by your choice of means. Your call. In either case, to keep the rhythm we have to stretch the interrupt cycle out to 8T. A 50p coin is better than nothing.

The NMI cycle is just a dummy instruction fetch with a runt pulse on !MREQ in T3 which we can ignore because the CPU would be off the bus anyway. £1!

The overall effect, then, is that we have introduced some ancillary logic to the Z80 so that it can vacate the system bus for 2T out of every 4T, at the cost of memory access by the CPU taking 14.3% longer than theoretically possible, and a modest hit to interrupt latency which in practice may not mean all that much. In doing so we have saved millions of RAM chips and tens of millions of pounds. Going a little bit out of brief, while the video system is not actively fetching frame buffer data, its 2T cycles could be borrowed to service other peripherals, for example, to buffer up sprite data, service disk controllers, or output PCM audio. Going very far afield, it may be evident that, instead of display DMA, we could place a second Z80 with much the same ancillary logic, sync them up, and have them both run at nearly full speed out of the same memory and I/O space, with the usual caveats about multiprocessing. If you were feeling especially naughty, you could pull the wool over the second CPU's data lines and feed it NOP instructions while exploiting its program counter as an address generator for the video output (beware, people have been knighted for doing this sort of thing).

Quote

Part of the reason for this post was to discuss the best way to interface the FPGA with the computer - frame buffer in computer RAM, or gated via the FPGA, for example. Both have positives and negatives, I just don't have the knowledge or experience to understand their weight and balance the options properly, if that makes sense?

The word salad I wrote above is a walk through the sort of thinking the Old Ones utilized in many of the 1980s home computer designs, starting with the need to compete vigorously on both BOM cost and performance. The IBM PC, coming from its mainframe background and the culture of modularity, could not engage in this level of coupling. Either approach is certainly feasible. Which one is more desirable is a systems-level decision that depends, in part, on the desired display size and depth, and in turn the desired pixel rate, but also on cost, code size, flexibility, programmer convenience, and so on. "Better than the Amstrad" is an open-ended brief that could encompass anything from the tightly-coupled home computer systems as napkin-designed here, to an ISA bus bridge to a CGA/VGA/EGA/Hercules card rescued from the scrap pile. In any case, there will be side effects which you also have to pursue down the line and decide to live with for your application.

Oh, if you want to play USB host, the SL811HS is a classic choice, which incidentally can also be configured as a device should you wish to link your system to your PC. Be advised that USB HID can be a bit of a hairball.

Berni · « **Reply #35 on:** October 18, 2019, 08:09:10 pm »

You are worried a bit too much about the FPGA implementation of the memory, framebuffer and access methods.

A lot of the retro ways of doing things in 80s computers are optimizations to get the thing working with as few chips as possible that containin as few transistors as possible in order to make a cost effective computer that a home user could afford. But modern FPGAs are rather large so keeping down the number of gates is not so important while you can now cheaply buy a few megabytes of SRAM with 10ns access time (So a max clock speed of about 100 MHz). This means such optimization tricks to reduce transistor count and memory usage are not required to make the design fit into a reasonably sized FPGA.

So if you connect all your components to the pins of a FPGA you can implement your graphics card in any way you like. The block RAM inside FPGAs is naturally dual port anyway and allows being written to at the same time as its being read out of and can typically run at >100MHz. This means you can implement access to the memory in any way you wish. Infact it would be perfectly possible to have 8 of those Z80s talking to the same RAM inside the FPGA simultaneously, doing read/write access to the same memory locations and all at the full 8MHz speed on each Z80. Due to external modern SRAM having 10ns access times the same is possible even withe external memory chips. But since you have only one Z80 this means that 90% of the available memory bandwidth is left unused and so can be utilized by any of the video or graphics acceleration hardware built in the FPGA without slowing down the Z80 at all.

So just pick whatever graphics architecture you would like to have and implement it in the FPGA. You can also implement multiple ones at the same time, like have a register command based interface and memory mapped simultaniusly, or have multiple modes like PC graphics cards do where you get various text or graphics modes that provide different advantages (Some provide lots of resolution and colors, some very few but take less CPU horsepower to use).

That being said the modern way of handling graphics acceleration in PCs is in the form of "draw calls". The GPU memory is available to read/write from the CPU as it wishes, but GPU memory tends to be in physically separate memory chips for speed reasons, so the GPU just forwards the read/write operations trough to its memory. Then the way that the GPU is made to do work is that the CPU places a list of things to do into its memory, each item on that list is a draw call and its a structure containing information of what you want the GPU to do ranging from "Set pixel 23,500 to color 255,255,255" to "Draw bitmap located at 0x554600 (512x512 RGBA 32bit) into framebuffer at coordinates 84,46 with alpha blending enabled" to "Apply the matrix transformation [0.45,1,1,0,5.4,5.5 .....] to the array of 16384 points in XYZ located at 0x7755000" to "Load the shader script bytecode at 0x5477400 into 128 shader units and run in parallel on the entire framebuffer" and so on and so on.... This list can be as long as you like and is typically built these days by a API like OpenGL or DirectX with the help of the graphics drivers. Once this "cooking recipe" structure is in the GPUs memory it is told to execute it and the GPU will run along on on its own without the CPUs intervention and some miliseconds later the delicious resulting image will be sitting there in another location of video memory. If the video output hardware is also pointing to that location in memory as its framebuffer than the image will also be sent out via HDMI so that your eyes can also enjoy the delicious image the GPU has baked for you.

This draw call approach is by far not the only solution to doing this, but its a solution that fits well on modern computers because the CPU can just throw this few kilobyte large "cooking recipe" and go do something more important while the GPU independantly works hard on baking the recipe without hogging any of the CPUs resources.

Obviously implementing all the draw calls of a modern graphics card is an insanely huge task, but in order to get some impressive graphics going you only need a few simple ones. Things like filling a rectangle with a single color or drawing one bitmap on top of another bitmap, perhaps also with transparency effects and alpha blending. Later on you might want to add support for 2D transformation matrices as that requires little hardware but lets you implement the stuff that for example the SNES can do in Mode 7. All of this can result in graphics that surpass the 16bit console era because you have a unfair advantage of having 10 times more memory bandwidth than they did. But for 8bit games what you will find the most useful is tilemap and sprite support since this requires very little data to be manipulated by the slow CPU, so its very fast even on wimpy old chips.

Oh and you want to avoid things like USB. Yes you can get a USB host IP module for a FPGA but that will only implement the actual USB port and its transfer protocol (glorified UART), it still needs the higher level protocols that initialize USB devices and drivers to talk to them. This will need a CPU to handle, so you will likely end up with a softcore CPU inside your FPGA that's actually more powerful than the Z80 itself, sitting there to run drivers for USB devices. If you want mouse and keyboard stick to the simple PS/2

nockieboy · « **Reply #36 on:** October 18, 2019, 09:56:46 pm »

Quote from: jhpadjustable on October 18, 2019, 07:39:32 pm

For example, if you look at the instruction fetch cycle you'll see that the last 2T of the M1 cycle is spent "refreshing"...
The overall effect, then, is that we have introduced some ancillary logic to the Z80 so that it can vacate the system bus for 2T out of every 4T, at the cost of memory access by the CPU taking 14.3% longer than theoretically possible, and a modest hit to interrupt latency which in practice may not mean all that much...

I suspect I'm going to be reading and analysing what you've written here for months to come.

But from what I can gather, there's the opportunity to steal cycles from the Z80 and open the address and data buses up to peripherals (like the video controller) whilst the Z80 is pausing for breath in the middle of its memory/IO cycles? That's some major optimisation!! Did you design these systems back in the 80's?

Quote from: jhpadjustable on October 18, 2019, 07:39:32 pm

(beware, people have been knighted for doing this sort of thing).

Haha - yes, I'm aware there's quite a few hacks in the old Sinclair ZX80 and ZX81 (and even in the later Spectrums, I believe) to get them to display video inbetween reading the keyboard, or something? I remember there being a reason why the screen blanks out when you press a key...

Quote from: jhpadjustable on October 18, 2019, 07:39:32 pm

Either approach is certainly feasible. Which one is more desirable is a systems-level decision that depends, in part, on the desired display size and depth, and in turn the desired pixel rate, but also on cost, code size, flexibility, programmer convenience, and so on. "Better than the Amstrad" is an open-ended brief that could encompass anything from the tightly-coupled home computer systems as napkin-designed here, to an ISA bus bridge to a CGA/VGA/EGA/Hercules card rescued from the scrap pile. In any case, there will be side effects which you also have to pursue down the line and decide to live with for your application.

That's really what I wanted to know - is there some big reason NOT to go for one approach over another. I'm thinking for the sake of simplicity, I'll go with the Z80 only having a simple IO connection to the FPGA. At least in the first instance - I can always change to an alternative method of interfacing later if the need arises, I guess.

Quote from: jhpadjustable on October 18, 2019, 07:39:32 pm

Oh, if you want to play USB host, the SL811HS is a classic choice, which incidentally can also be configured as a device should you wish to link your system to your PC. Be advised that USB HID can be a bit of a hairball.

Well, I'm aware there's a significant software overhead due to the vast range of HID devices - I would literally just be looking for basic keyboard reading - but perhaps I'll have to stick to PS2 then.

Quote from: Berni on October 18, 2019, 08:09:10 pm

You are worried a bit too much about the FPGA implementation of the memory, framebuffer and access methods.

I guess this stems from my lack of experience with FPGAs. I'm learning quite quickly from this forum that FPGAs are actually blisteringly fast and flexible beyond belief - at least to my inexperienced mind.

Quote from: Berni on October 18, 2019, 08:09:10 pm

So if you connect all your components to the pins of a FPGA you can implement your graphics card in any way you like. The block RAM inside FPGAs is naturally dual port anyway and allows being written to at the same time as its being read out of and can typically run at >100MHz. This means you can implement access to the memory in any way you wish. Infact it would be perfectly possible to have 8 of those Z80s talking to the same RAM inside the FPGA simultaneously, doing read/write access to the same memory locations and all at the full 8MHz speed on each Z80. Due to external modern SRAM having 10ns access times the same is possible even withe external memory chips. But since you have only one Z80 this means that 90% of the available memory bandwidth is left unused and so can be utilized by any of the video or graphics acceleration hardware built in the FPGA without slowing down the Z80 at all.

And that's exactly the kind of answer I needed - perhaps I have been overly concerned about the speed of the design, but that's born out of a lack of knowledge. I did admit that I'm no expert in this field and have been learning as I've gone along.

So basically, I should just crack on with my preferred design and see how it goes. These FPGAs are so fast and flexible that most of the magic will be done in the VHDL?

Quote from: Berni on October 18, 2019, 08:09:10 pm

This draw call approach is by far not the only solution to doing this, but its a solution that fits well on modern computers because the CPU can just throw this few kilobyte large "cooking recipe" and go do something more important while the GPU independantly works hard on baking the recipe without hogging any of the CPUs resources.

This was one of the possible options floating around in my head for the IO-based interface. The Z80 could just send a load of commands and data to the FPGA which would buffer it into a FIFO (if needed - the FPGA will likely not need to buffer much as it's so much quicker, I'm guessing) and the FPGA would perform operations on those commands into the frame buffer whilst it's being read and streamed to the output by the video signal generation part. That's the other thing I need to remember about FPGAs - they're not one computation block, all their components run in parallel with each other.

Quote from: Berni on October 18, 2019, 08:09:10 pm

Obviously implementing all the draw calls of a modern graphics card is an insanely huge task, but in order to get some impressive graphics going you only need a few simple ones. Things like filling a rectangle with a single color or drawing one bitmap on top of another bitmap, perhaps also with transparency effects and alpha blending. Later on you might want to add support for 2D transformation matrices as that requires little hardware but lets you implement the stuff that for example the SNES can do in Mode 7. All of this can result in graphics that surpass the 16bit console era because you have a unfair advantage of having 10 times more memory bandwidth than they did. But for 8bit games what you will find the most useful is tilemap and sprite support since this requires very little data to be manipulated by the slow CPU, so its very fast even on wimpy old chips.

Absolutely - I think this will be the biggest area for development then. The actual architecture of the graphics card sounds like it will be really simple, aside from level converters for the 5v to 3v3 and some basic IO address decoding, there will be little else other than the FPGA and HDMI output. In fact, I suppose there's no reason why I can't integrate the IO address decoding into the FPGA as well? This is getting easier by the minute!

Quote from: Berni on October 18, 2019, 08:09:10 pm

Oh and you want to avoid things like USB. Yes you can get a USB host IP module for a FPGA but that will only implement the actual USB port and its transfer protocol (glorified UART), it still needs the higher level protocols that initialize USB devices and drivers to talk to them. This will need a CPU to handle, so you will likely end up with a softcore CPU inside your FPGA that's actually more powerful than the Z80 itself, sitting there to run drivers for USB devices. If you want mouse and keyboard stick to the simple PS/2

asmi · « **Reply #37 on:** October 19, 2019, 12:20:09 am »

If you want to learn how to design a computer system from the ground up, start designing your own RISC-V RV32I core. It's got only 37 instructions (if I can count lol), and there is a full-blown gcc compiler for it, so you can use real-deal C/C++! Once you have a core, you can start designing peripherals and figuring out how to connect them to the CPU bus, than at some point you will realize you need DMA controller to send data around without CPU involvement - this will force you to implement a multi-master bus so that peripheral can do bus mastering as well as the CPU core, etc. There is almost no limit to how far you can go with this - implementing multi-core system, adding support for external expansion buses (like PCI, or PCI Express), and so on!

For video I really recommend using HDMI because it's so ridiculously easy to implement in its most basic form (RGB-24bit output, video-only), and lower resolutions aren't very taxing on FPGA performance-wise or layout.

hamster_nz · « **Reply #38 on:** October 19, 2019, 12:31:47 am »

Just wanting to be clear... Most people are talking about implementing DVI-D, not HDMI, even though they use the same physical architecture and low-level coding scheme. True HDMI is a step up, and involves adding data islands, audio, BCH codes and so on...

Best reference documents for DVI-D are the Digital Display Working Group's specifications.

jhpadjustable · « **Reply #39 on:** October 19, 2019, 03:46:44 am »

Quote from: nockieboy on October 18, 2019, 09:56:46 pm

I suspect I'm going to be reading and analysing what you've written here for months to come. But from what I can gather, there's the opportunity to steal cycles from the Z80 and open the address and data buses up to peripherals (like the video controller) whilst the Z80 is pausing for breath in the middle of its memory/IO cycles? That's some major optimisation!!

You do have the gist of it correct. You don't need to pay too much mind to the ramblings of an animated professor in a lecture. Try a drier summary: a system bus, designed for 2T bus cycles, can be time-multiplexed into 4T frames of two fixed time slots, each 2T long (one bus cycle). Each slot is dedicated to an independent master(s), each slot which may perform up to one bus cycle every frame. You can place some glue logic between the Z80 bus (or any other master(s) of your choice) and the system bus to connect/disconnect the master's address, data, and control signals to/from the bus at the beginning/end of its time slot, to smooth over timing differences between the 2T system bus and the Z80's 4T cycles, to hold data received from the system bus until the master is ready for it, and to keep the Z80's machine cycles synchronized with its time slot. The 1980s-era home computer, in 127 words.

If you're a visual learner, print out and clip out the timing diagrams from the Timing section of the Z80 data sheet and lay them on a table, with each start of T1 aligned vertically. Use another sheet of paper to cover from the end of T2 onward and observe that you have enough information to start a cycle. Have scissors handy to cut the memory cycles at the beginning of T3 and introduce one TW of space between. Print/cut out another set, maybe in a different color, and lay them out offset by 2T from the first set. Shuffle them around and examine the interplay.

Quote

Did you design these systems back in the 80's?

No, I was just a boy at that time, but I gave one of the Amiga designers a ride to San Francisco once.

I did have a few of those 8-bit machines and a couple of 16-bit machines, and fell in passionate, enduring love with PEEK and POKE. I followed the demo scene for a bit later on, even tried to write a few screens that turned out nothing to write home about.

Quote

That's really what I wanted to know - is there some big reason NOT to go for one approach over another. I'm thinking for the sake of simplicity, I'll go with the Z80 only having a simple IO connection to the FPGA. At least in the first instance - I can always change to an alternative method of interfacing later if the need arises, I guess.

You would need to design the multiplexed bus into the system from the beginning. There's always the next machine, right?

If you're using private, single-ported frame buffer RAM, the video system doesn't master the system bus, so time-multiplexing the system bus would be a needless complication, especially if there is already a "normal" DMA controller in the system. Time-multiplexing may still be useful on the private frame buffer side, to arbitrate between the pixel serializer, the graphics coprocessor, and the system bus interface. As powerful as FPGAs are, they're not quite quantum computers and there will still be single resources that must be shared by multiple clients. DRAM chips, even with their multifarious burst modes, still take a moment to close a row and open a new one.

I will call your attention to the timing diagrams for memory vs. I/O accesses, and point out that I/O accesses are 1T longer than memory accesses, by design. That may add up when moving a lot of data from system memory to the frame buffer without DMA. I'll also point out that there are 16-bit load/store instructions which save one or two instruction fetches for every two bytes, and may be more convenient (and faster) for programming, but there are no 16-bit I/O instructions as far as I can tell.

Since you do have the full address bus at the FPGA's disposal, and you prefer not to use a memory-mapped window into frame buffer RAM, I offer the idea of mapping individual registers into a 4kB or so block of address space rather than using a single command/data register pair.

Your display interface won't have as much hidden or inaccessible state as a command/data port pair paradigm would imply
The modules in your display controller will be less co-dependent
As a consequence of both of the above, you can change controller parameters (e.g. palette, active area start/stop) in response to "beam position"-triggered interrupt handlers for special effects without saving/restoring any state
Easier to program, especially from C
Could be easier to query by the processor, in case that is necessary

Quote

Well, I'm aware there's a significant software overhead due to the vast range of HID devices - I would literally just be looking for basic keyboard reading - but perhaps I'll have to stick to PS2 then.

Fair enough. You can add-on USB later if one day you happen to wake up really ambitious and feel like porting code from MSX USBorne project

For the SL811HS you need only two addresses in I/O space, eight data lines, and the usual bus handshaking signals, which could fit very easily on an expansion board.

asmi · « **Reply #40 on:** October 19, 2019, 04:49:44 am »

Quote from: hamster_nz on October 19, 2019, 12:31:47 am

Just wanting to be clear... Most people are talking about implementing DVI-D, not HDMI, even though they use the same physical architecture and low-level coding scheme. True HDMI is a step up, and involves adding data islands, audio, BCH codes and so on...

Isn't all of that stuff optional? My memory on HDMI spec is rather rusty atm.
But whatever - the point is - it works, and fundamentally it's just a couple of counters and few comparators to figure out blanking areas - which incidentally you will need for VGA, the only difference is output SERDES. But on the other hand you get full 24bit color space to work with without a need for any sort of DACs (of which you'll need three for VGA - again if my memory serves me), and you need just 8 pins + few auxiliary ones like HPD and I2C channel, instead of a ton of pins to external parallel DACs. And you get a clear path for upgrade if you ever decide for it - this one is the most important for me (even more so that I'm working on DisplayPort module so that I can go up beyond FullHD).

legacy · « **Reply #41 on:** October 19, 2019, 07:19:37 am »

Quote from: nockieboy on October 18, 2019, 09:56:46 pm

I'll go with the Z80 only having a simple IO connection to the FPGA

With a physical Z80 cpu chip? at 5V? having the FPGA's IO at 3.3V?

hamster_nz · « **Reply #42 on:** October 19, 2019, 07:43:22 am »

Quote from: asmi on October 19, 2019, 04:49:44 am

Quote from: hamster_nz on October 19, 2019, 12:31:47 am
Just wanting to be clear... Most people are talking about implementing DVI-D, not HDMI, even though they use the same physical architecture and low-level coding scheme. True HDMI is a step up, and involves adding data islands, audio, BCH codes and so on...
Isn't all of that stuff optional? My memory on HDMI spec is rather rusty atm.
But whatever - the point is - it works, and fundamentally it's just a couple of counters and few comparators to figure out blanking areas - which incidentally you will need for VGA, the only difference is output SERDES. But on the other hand you get full 24bit color space to work with without a need for any sort of DACs (of which you'll need three for VGA - again if my memory serves me), and you need just 8 pins + few auxiliary ones like HPD and I2C channel, instead of a ton of pins to external parallel DACs. And you get a clear path for upgrade if you ever decide for it - this one is the most important for me (even more so that I'm working on DisplayPort module so that I can go up beyond FullHD).

If it doesn't have video guard bands, TERC4 data islands and the data island that defines the display format it isn't HDMI, it's just DVI...

Let me know if you need a hand or advice with DisplayPort. A few years ago I knew the older spec backwards, and got 4k streams going on a few different boards... These might help:

https://github.com/hamsternz/FPGA_DisplayPort

https://github.com/hamsternz/DisplayPort_Verilog

nockieboy · « **Reply #43 on:** October 19, 2019, 09:13:10 am »

Quote from: asmi on October 19, 2019, 12:20:09 am

If you want to learn how to design a computer system from the ground up, start designing your own RISC-V RV32I core...

One thing at a time, asmi.

Quote from: asmi on October 19, 2019, 12:20:09 am

For video I really recommend using HDMI because it's so ridiculously easy to implement in its most basic form (RGB-24bit output, video-only), and lower resolutions aren't very taxing on FPGA performance-wise or layout.

Yes, I'm sold on HDMI and will be building with that end goal in mind. Plus the Spartan 6 - my FPGA of choice, it seems - appears to have hardware support for HDMI in its IO blocks.

Quote from: jhpadjustable on October 19, 2019, 03:46:44 am

The 1980s-era home computer, in 127 words.

You make it sound so simple...

Quote from: jhpadjustable on October 19, 2019, 03:46:44 am

No, I was just a boy at that time, but I gave one of the Amiga designers a ride to San Francisco once. I did have a few of those 8-bit machines and a couple of 16-bit machines, and fell in passionate, enduring love with PEEK and POKE. I followed the demo scene for a bit later on, even tried to write a few screens that turned out nothing to write home about.

Oh wow - I loved my Amiga(s). I had an A500+ and then upgraded to the A1200 with a hard drive when I went to university. They were going to rule the world - then the PC happened.

PEEK and POKE were a couple of the first commands I created for my system's monitor program. They were like dark magic back when I was a kid in the 80's.

Quote from: jhpadjustable on October 19, 2019, 03:46:44 am

I will call your attention to the timing diagrams for memory vs. I/O accesses, and point out that I/O accesses are 1T longer than memory accesses, by design. That may add up when moving a lot of data from system memory to the frame buffer without DMA. I'll also point out that there are 16-bit load/store instructions which save one or two instruction fetches for every two bytes, and may be more convenient (and faster) for programming, but there are no 16-bit I/O instructions as far as I can tell.

Well, I'll have to wait to get a prototype up and running to see the performance of the IO interface and decide if I need to look at something different. I'm not a professional games programmer, so I'm not talking about blockbuster graphics, parallax scrolling and FMV when I talk about what I want from game graphics, but I guess it still waits to be seen what the interface will be capable of. I'm aware that memory access is faster than IO access because of the extra commands and the additional WAIT state that the Z80 inserts into IO cycles, I'm just not sure how that will play out practically - I'm hoping the difference will not be noticeable.

Quote from: jhpadjustable on October 19, 2019, 03:46:44 am

Since you do have the full address bus at the FPGA's disposal, and you prefer not to use a memory-mapped window into frame buffer RAM, I offer the idea of mapping individual registers into a 4kB or so block of address space rather than using a single command/data register pair.

Let me make sure I understand this - any writes to the appropriate memory space would be picked up by the FPGA and read into internal memory which effectively shadows the system's memory 'window'? Actually, that would be really useful to transport large amounts of data from the system into the GPU. Rather than using a sequence of, say, (a minimum of) 64 IO writes to create a sprite in the GPU's memory, the whole thing could be copied into the RAM 'window' using the faster memory commands...

Quote from: legacy on October 19, 2019, 07:19:37 am

Quote from: nockieboy on October 18, 2019, 09:56:46 pm
I'll go with the Z80 only having a simple IO connection to the FPGA

With a physical Z80 cpu chip? at 5V? having the FPGA's IO at 3.3V?

Yes, a physical Z80 at 5v and the FPGA at 3.3v. I did mention earlier that I'd be level-shifting the voltages between the system and GPU, probably using 74LVC buffers or transparent latches or whatever, so don't panic.

legacy · « **Reply #44 on:** October 19, 2019, 09:50:40 am »

Quote from: nockieboy on October 19, 2019, 09:13:10 am

Yes, a physical Z80 at 5v and the FPGA at 3.3v. I did mention earlier that I'd be level-shifting the voltages between the system and GPU, probably using 74LVC buffers or transparent latches or whatever, so don't panic.

no panic, just, I did this stuff two years ago, and it consumed a lot of precious time, hence, reconsidering it, I have some regrets about my past choices. Anyway, I am not here to motivate or demotivate people, so do as you wish. Just, if I were in you, I would about to add extra chips on the PCB.

There are 3.3V Z80 compatible cores.

legacy · « **Reply #45 on:** October 19, 2019, 10:25:40 am »

Like this

james_s · « **Reply #46 on:** October 19, 2019, 05:22:18 pm »

There are some CPUs, such as the 6800 for which there are not (at least not open source) any cycle-accurate softcores. The Z80 however already has more than one very well tested softcore that works just like the real thing, no need to have a real phyiscal Z80 in the mix if you already have the FPGA. The softcore can do everything the original can and much more if you wish.

nockieboy · « **Reply #47 on:** October 19, 2019, 07:49:11 pm »

Quote from: james_s on October 19, 2019, 05:22:18 pm

There are some CPUs, such as the 6800 for which there are not (at least not open source) any cycle-accurate softcores. The Z80 however already has more than one very well tested softcore that works just like the real thing, no need to have a real phyiscal Z80 in the mix if you already have the FPGA. The softcore can do everything the original can and much more if you wish.

Well, I'm building this GPU for a hardware Z80 system. Whilst I appreciate that the FPGA could do everything my hardware system does, that's not the point of this little project. Perhaps an FPGA is total overkill, though.

I've been looking a little more closely at the Spartan LX45 and it's looking less and less likely that I'll be able to use it, even if I could justify the cost. I think BGA is a step too far for my soldering skills and equipment at this stage, and the sheer number of pins on those FPGAs will stretch my DipTrace licence past breaking point. One way around it is to just use one the cheap development boards and plug that straight into my 'GPU card'. Limited IO, but with the FPGA, SDRAM, clock and programming circuitry done for me...

ledtester · « **Reply #48 on:** October 19, 2019, 07:49:50 pm »

You might be interested in the video chip being developed for the 8-bit guy's "Dream Machine":

(starts at 9:50)

https://youtu.be/sg-6Cjzzg8s?t=9m50s

jhpadjustable · « **Reply #49 on:** October 19, 2019, 08:03:42 pm »

Quote from: nockieboy on October 19, 2019, 09:13:10 am

Oh wow - I loved my Amiga(s). I had an A500+ and then upgraded to the A1200 with a hard drive when I went to university. They were going to rule the world - then the PC happened. PEEK and POKE were a couple of the first commands I created for my system's monitor program. They were like dark magic back when I was a kid in the 80's.

Excellent. That gives me a touchstone to explain some things by analogy. By the way, you didn't have anything important to do this weekend, did you? https://archive.org/details/Amiga_Hardware_Reference_Manual_1985_Commodore

Just for interest, you may be aware there are HDL implementations of the Amiga OCS that might be imported directly into your design, with mimimal modifications. A Z80 driving the OCS chip set could make for a pretty wild experiment, even at 1/2 the memory bandwidth.

Quote

Well, I'll have to wait to get a prototype up and running to see the performance of the IO interface and decide if I need to look at something different. I'm not a professional games programmer, so I'm not talking about blockbuster graphics, parallax scrolling and FMV when I talk about what I want from game graphics, but I guess it still waits to be seen what the interface will be capable of. I'm aware that memory access is faster than IO access because of the extra commands and the additional WAIT state that the Z80 inserts into IO cycles, I'm just not sure how that will play out practically - I'm hoping the difference will not be noticeable.

Digital design is still not a qualitative discipline

, but I'll bite anyway. 75% of theoretical bandwidth won't be much noticeable for relatively low-intensity usage like ping-pong or the like. But also see below.

Quote

Let me make sure I understand this - any writes to the appropriate memory space would be picked up by the FPGA and read into internal memory which effectively shadows the system's memory 'window'? Actually, that would be really useful to transport large amounts of data from the system into the GPU. Rather than using a sequence of, say, (a minimum of) 64 IO writes to create a sprite in the GPU's memory, the whole thing could be copied into the RAM 'window' using the faster memory commands...

That would be the memory-mapped window into frame buffer RAM I mistakenly believed you disfavored, but yes, I do think it's a very good idea. I'd also be sure it services reads as well. In fact, once it is servicing both reads and writes, you could make the window very large, like a megabyte give or take, and perhaps scrap the bank switching entirely. Then you have something very much like Amiga chip RAM, in that whatever memory isn't being used by the display can be used by the Z80 for general purposes and/or graphics...

But I was actually proposing that you memory- (or I/O-) map the control registers, using the system bus control/address/data signals to more or less directly read/write registers inside the FPGA and control the video hardware, analogous to the common idiom of using 74377 or similar ICs with suitable decoding as byte-wide input-output ports. C pseudocode:

Code: [Select]

const struct my_video_chip *video = 0xDFF000; /* ;) */
void setpalettecolor_reg(uint8_t index, uint8_t red, uint8_t green, uint8_t blue)
{
  video->palette[index].red = red;
  video->palette[index].green = green;
  video->palette[index].blue = blue;
}
void enabledisplay_reg()
{
  video->displayenable |= 1; /* SET 0, (HL) */
}

Quote

DipTrace

You could always switch horses to KiCAD. #justsayin


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FPGA VGA Controller for 8-bit computer (Read 493056 times)

Share me