Would you be interested in contributing to SDCC, a free C compiler targeting small devices?
Probably not, I've not seen much open-source code I'd be happy working on... but that's a different discussion.
Wait, are you the Dunfield C Dunfield?
Mea culpa!
Regarding the source, I've not made it publicly available, but I've been known to give it out to interested parties.
For things not pertaining to the Padauk discussion, please contact me off list. Either through the messaging here (I'll try to check it at least every couple of days), or to my email at:
my-first-name{dot}my-last-name{@}gmail{dot}com
Regarding simulation:
With modern tools, "switch" can be as efficient as a function call table, and you can avoid the overhead of functions entry/exit on each instruction (which can be several instructions to establish and release a stack frame).
I find the most difficult part doing the flags... C has a disadvantage of not having visibility to the processor flags, so when coding a simulator in C you need to work out the flag updates manually. This is usually more operations than performing the main operation of the simulated instruction.
Back in the day, when PC's were not a whole lot faster than the embedded systems we were emulating, I coded the simulation functions in assembly language, where I could reduce decoding to a very simple jump table, and access the flags set by the operation directly (which are usually at least similar to the setting of flags on the target processor.
My EMILY52 simulator could emulate an 8051/52 in real time on a Pentium1 (or even a decent 486). The simulation engine was coded in 8086 assembly and had an advantage that the 8051 flags were very similar to the 8086 flags. Instruction that didn't set flags would jump directly back to the next decode, instructions which set flags jumped to one of several routines which saved the flags in question (I think there were 3 or 4 different combinations of flags set by instructions) before heading on back to the next decode.
For instructions which used the flags, I simply loaded the 8086 flags from the saved flags before executing the instruction. When the flags were read as a byte, I used a lookup table to translate the saved 8086 flags into the corresponding 8051 flags positions.
8051 registers were only saved/restored to 8086 registers when the simulation stopped or started ... while running, they resided in 8086 registers and did not have to be loaded "per instruction".
Thing ran fast! - on more modern systems, it runs considerably faster than a hardware 805x ... It was so "streamlined" that the simulation engine had no "test for exit" .. just a tight instruction decode loop. When the user hit "STOP" I would plant a JMP instruction into the loop to break it which was then patched back by the exit handler.
Another technique I've used in the past to achieve high-speed simulation, which works well if:
1) You can live with variance in the speed the first time a block of code gets executed,
2) You don't support or accept higher overhead for self-modifying code
Is to have an array of JMP targets for every address (easy if you processor has 1024 words of code memory, harder if it has megs)... At the beginning, each JMP targets a routine which decodes the instruction and updates the JMP target to point to the handler for that instruction before launching it.
On subsequent passes the JMP branches directly to the correct instruction handler, and never even has to look at the opcode.
This is particularity effective if the opcodes are complex to decode words. For simple opcodes, a simple JMP table will be nearly as fast and much simpler to debug.
On a PC I'd probably use a simple JMP table for the PMS150, 13 bit = 8k words (32k bytes) and would give you one simple direct jump to the correct handler.
On the STM with 64K of code, 32K would be a bit much, I'd probably divide the opcodes into blocks based on which bits are used and decode with ANDs and a few switches.
But I haven't really looked into it that far yet...
Dave