Ah the memories it brings back.
In the old 386 days I wrote my own DOS extender in assembler, to allow me to write a MOD file player using 32-bit code rather than the old 8086 segmented real mode. It was something like this....
- Build the Local and global descriptor tables that mapped virtual memory addresses to physical memory
- set up a whole lot of hardware interrupt mapping, which can then call the BIOS ISRs in real mode.
- tell the keyboard controller to enable the AND gate on A20
- set up a new reset vector by mapping out the old one
- get the keyboard controller to reset the CPU, to make it jump to the 'new' reset vector
- then had to use 'bounce buffers' for DMA, as the soundcard could only DMA from the first MB of RAM
Good times
There are now so many kludges on top of kludges on top of workarounds that it really would not be fun to do Intel bare metal any more, unless you want to rediscover a whole lot of knowledge that should remain forgotten... HDD addressing limits is one that comes to mind.
x86 has gone from a 16-bit CPU at most a MB of RAM, with a couple of 360kB floppies, to hyperthreaded multicores 64-bit CPUs, 100s of GB of RAM, and TBs of disk, with multiple nested virtualization layers. That deep under the hood it is pretty krufty, if you know what I mean.