Author Topic: EEVblog #1144 - Padauk Programmer Reverse Engineering (Read 470902 times)

david.given · « **Reply #75 on:** November 19, 2018, 04:57:52 pm »

Incidentally, some interesting ISA quirks I've noticed:

Stack grows up!
No stack-relative addressing (C will be hard)
Indirect addressing uses a 16-bit pointer where the MSB must be 0

So if you wanted C-style stack frames, god help you, your function preamble/postamble would look like:

mov a, lb@fp ; read old frame pointer
pushaf ; save
mov a, sp ; read stack pointer
mov fp, a ; set new frame pointer
add A, 8 ; allocate stack frame
mov SP, A ; update stack pointer
...main code here...
mov a, fp ; read current frame pointer
mov sp, a ; retrack over frame
popaf ; load old frame pointer
mov fp, a ; reload old frame pointer
ret

Blech.

To read a value from the frame, you'd need:

Code: [Select]

mov a, 4 ; frame offset
add a, fp ; add frame pointer
mov lb@ptr, a
idxm a, ptr

...which is equally grim. Running traditional C on this would be a mug's game, of course.

Cowgol doesn't use stack frames, though, so it's golden.

You might be able to do some incredibly basic Forth, based around an opcode interpreter, with some hoops to jump through to be able to read words out of RAM as well as program memory, but it's probably only worth it for the lols --- I expect it would eat most of the program space and be slow to boot, and with the very limited I/O capability you wouldn't be able to interact with it once you'd done it. Admittedly, running Forth on a 3c machine would make for some pretty serious lols.

david.given · « **Reply #76 on:** November 19, 2018, 05:05:46 pm »

Yay, so much talking across people...

Re data tables: I think they're referring to this sort of thing:

Code: [Select]

; index in A, returns value in A
load_value_from_table:
  pcadd a ; jump to pc + a
  ret 42
  ret 99
  ret -6
  ret 127
  ...etc...

I haven't seen anything for accessing program data. I assume the assembler knows whether to generate an IO or RAM instruction by the type of the symbol you're addressing.

Re the other flags: the flags byte is at IO address 0, so you can test any individual but with t1sn IO.3 or similar. I don't think we can get away with ignoring them...

DDunfield · « **Reply #77 on:** November 19, 2018, 05:12:50 pm »

Quote from: david.given on November 19, 2018, 05:05:46 pm

Re the other flags: the flags byte is at IO address 0, so you can test any individual but with t1sn IO.3 or similar. I don't think we can get away with ignoring them...

Thanks, I hadn't picked up on that little detail yet (been busy working out the instructions). That explains it.

Dave

gdelazzari · « **Reply #78 on:** November 19, 2018, 05:48:10 pm »

Quote from: DDunfield on November 19, 2018, 04:49:12 pm

Seems like we've been duplicating some work.

I would like to insist on finding a way to better collaborate on this. Something like Slack could help since we would have a decent chat platform to keep everything organized and coordinate stuff. We could have various channels for RE of the programmer, RE of the ISA, tools implementation, etc... would anyone be interested?

Quote from: david.given on November 19, 2018, 04:57:52 pm

Cowgol doesn't use stack frames, though, so it's golden.

Your small language seems really interesting. I also do not see a lot of sense in porting "full" C for this architecture, IMHO it's just less of an hassle to write assembly. A mixed language with assembly+some high level constructs is the best thing, which is in fact what PADAUK did with their Mini-C. If we make an open source toolchain it wouldn't be bad to have our own language for this thing. Of course the self-hosting capability of your compiler (while being really cool) here is pointless, but I see myself liking a language like Cowgol with the possibility to interleave assembly code like PADAUK's Mini-C lets you do. I mean, instead of replicating their Mini-C we could take the opportunity to try something else.

Of course if someone manages to make C compile for this MCUs through SDCC or whatever, that would be really cool, but I would like to see how much overhead it is added, especially regarding function calls.

gslick · « **Reply #79 on:** November 19, 2018, 05:52:34 pm »

Quote from: DDunfield on November 19, 2018, 04:49:12 pm

Hi Guys,

Just found this thread. Seems like we've been duplicating some work.

I was planning to develop a PC based simulator to allow code to be quickly
tested and debugged, and also perhaps an STM32 based one which would allow
code to be tested "in circuit" with downloadability and decent debugging
capability. Would also like to eventually include device programming in
the STM32 but I've not looked really looked into it yet.

Regards,
Dave

Are you the ImageDisk Dave? Interesting to see you pop up on this thread and looking at these low cost microcontrollers.

js_12345678_55AA · « **Reply #80 on:** November 19, 2018, 06:24:31 pm »

Hi Dave,

Great that you did the 13 bit instruction set. This is no duplicated work. I did the 154C which uses 14 bit instructions.

Please have a look at the github project I started:

https://github.com/free-pdk

If you like I can add you to the developer list.

Regarding FLAGs:

Have a look at the ".INC" include file in IDE. You can see that IO @ 0x00 holds all the flags:

Code: [Select]

	FLAG		IO_RW		0x00
		OV	IO_RW		FLAG.3
		AC	IO_RW		FLAG.2
		CF	IO_RW		FLAG.1
		ZF	IO_RW		FLAG.0

So with simple bit tests like T0SN IO.n and T1SN IO.n you can check and skip next instructions with flags.

Example: To test for OV flag following code can be used:

Code: [Select]

	T1SN OV
	GOTO OVCLEAR
	NOP
	NOP
OVCLEAR:

You also could use T1SN FLAG.3

In the .INC file you also see that SP is emulated as an IO register and more interesting stuff.

In the next few days I will write a simulator for the 14 bit core. This should not take long at all.

JS

spth · « **Reply #81 on:** November 19, 2018, 06:26:48 pm »

Quote from: gdelazzari on November 19, 2018, 05:48:10 pm

Your small language seems really interesting. I also do not see a lot of sense in porting "full" C for this architecture, IMHO it's just less of an hassle to write assembly. A mixed language with assembly+some high level constructs is the best thing, which is in fact what PADAUK did with their Mini-C. If we make an open source toolchain it wouldn't be bad to have our own language for this thing. Of course the self-hosting capability of your compiler (while being really cool) here is pointless, but I see myself liking a language like Cowgol with the possibility to interleave assembly code like PADAUK's Mini-C lets you do. I mean, instead of replicating their Mini-C we could take the opportunity to try something else.

Of course if someone manages to make C compile for this MCUs through SDCC or whatever, that would be really cool, but I would like to see how much overhead it is added, especially regarding function calls.

The Padauks aren't the only architecture on which stack access is inefficient. For other such architectures (e.g. MCS-51), SDCC makes functions non-reentrant by default (i.e. unless the function is marked __reentrant or the --stack-auto command-line argument is supplied), placing local variables at fixed memory locations.

Philipp

gdelazzari · « **Reply #82 on:** November 19, 2018, 07:13:48 pm »

Quote from: spth on November 19, 2018, 06:26:48 pm

SDCC makes functions non-reentrant by default (i.e. unless the function is marked __reentrant or the --stack-auto command-line argument is supplied), placing local variables at fixed memory locations.

Thanks for telling, I never used SDCC before actually and didn't know it supported even this kind of architectures. Nice to know.

DocBen · « **Reply #83 on:** November 19, 2018, 07:44:14 pm »

By popular demand

https://github.com/mypdk/radare_plugins

Quote from: gdelazzari on November 19, 2018, 10:11:11 am

Anyhow my idea for a "universal" instruction table was to have a list of objects like this, if this can somehow interest you:

It may be faster than your current approach (which I like anyway, it is very readable IMHO) but by pattern matching like that may not be the most efficient thing, also you'll have to reconstruct the parameter values and other stuff which would be easier with a table like mine, at least that's what I anticipate.

You're absolutely right your instruction table is more precise but it is a lot less generic and requires more work.
My idea was to simply let the computer figure these things out a later point. The basic idea is to have something as easy to read and maintain as possible for these little buggers.
Later a parser can look at the definitions and write something like your structs automatically or generate an optimized assembler/disassembler.

I'm an engineer and thus obligated to be as lazy as possible

gdelazzari · « **Reply #84 on:** November 19, 2018, 08:07:48 pm »

Quote from: DocBen on November 19, 2018, 07:44:14 pm

I'm an engineer and thus obligated to be as lazy as possible

I'm still not an engineer, just half-way through, so I can still be not lazy sometimes

I'm implementing the method I described, currently my table is defined like this:

Code: [Select]

static const auto PMS154_INSTRUCTIONS_SPEC = std::vector<instr_spec_t>
{
  // {mnemonic, opcode, opcode_mask, {args...}}

  {"nop",     0x0000, 0x3FFF, {}},

  {"addc",    0x0060, 0x3FFF, {{Accumulator, 0, 0}}},
  {"subc",    0x0061, 0x3FFF, {{Accumulator, 0, 0}}},
  {"izsn",    0x0062, 0x3FFF, {{Accumulator, 0, 0}}},
  {"izsn",    0x0063, 0x3FFF, {{Accumulator, 0, 0}}},

  // ...

  {"wdreset", 0x0070, 0x3FFF, {}},

  {"pushaf",  0x0072, 0x3FFF, {}},
  {"popaf",   0x0073, 0x3FFF, {}},

  // ...

  {"xor",     0x00C0, 0x3FC0, {{IO, 0, 6}, {Accumulator, 0, 0}}},
  {"mov",     0x0180, 0x3FC0, {{IO, 0, 6}, {Accumulator, 0, 0}}},
  {"mov",     0x01C0, 0x3FC0, {{Accumulator, 0, 0}, {IO, 0, 6}}},

  {"ret",     0x0200, 0x3F00, {{Immediate, 0, 8}}},

  {"stt16",   0x0300, 0x3F81, {{Memory, 0, 7}}},
  {"ldt16",   0x0301, 0x3F81, {{Memory, 0, 7}}},
  {"idxm",    0x0380, 0x3F81, {{Memory, 0, 7}, {Accumulator, 0, 0}}},
  {"idxm",    0x0381, 0x3F81, {{Accumulator, 0, 0}, {Memory, 0, 7}}},

  {"swapc",   0x0400, 0x3E00, {{IO, 0, 6}, {Bit_N, 6, 3}}},

  {"comp",    0x0600, 0x3F80, {{Accumulator, 0, 0}, {Memory, 0, 7}}},
  {"comp",    0x0680, 0x3F80, {{Memory, 0, 7}, {Accumulator, 0, 0}}},
  {"nadd",    0x0700, 0x3F80, {{Accumulator, 0, 0}, {Memory, 0, 7}}},
  {"nadd",    0x0780, 0x3F80, {{Memory, 0, 7}, {Accumulator, 0, 0}}},

  // ...
};

Which I hope you agree is still pretty readable thanks to various new C++ things that allow it to be written like that. Maybe with opcodes written in binary would have been easier but well...

And the hearth of the disassembler code is literally 60 lines of code, for the assembler will just be a little bit more. Once you have such a table you can just iterate over all the table and find whatever matches. If you are disassembling you check the opcode with the mask, if you are assembling you check the mnemonic and the parameters in their respective order. If you need to assemble for a variant of the instruction set you just pass another table.

Anyway I'm following my initial idea to create a "libpdk" which does everything as I previously described. That doesn't mean my approach is orthogonal to what you are doing with radare2, i.e. you could then use the library with the assembling/disassembling capabilities to create the plugin pretty easily just by calling in the library code.

DDunfield · « **Reply #85 on:** November 19, 2018, 08:27:23 pm »

Quote from: gslick on November 19, 2018, 05:52:34 pm

Are you the ImageDisk Dave? Interesting to see you pop up on this thread and looking at these low cost microcontrollers.

I am.

Perhaps more relevant to these discussions is my past-life of creating development tools for little processors (mostly 8-bit). C compilers, assemblers, debuggers, simulators etc.
If you're bored, you can look under "sample projects" at www.dunfield.com to see some of the similar/related stuff I used to support myself with.

MCUs in the "few cents" range are an interesting development!

Regards,
Dave

** Mostly retired now, but always up for an interesting project!

gdelazzari · « **Reply #86 on:** November 19, 2018, 09:00:38 pm »

I got a disassembler for the 14-bit ISA working in a preliminary state based on the work that js_12345678_55AA did REing the opcodes. There are some strange things in the code generated by the IDE however, such as:

- sometimes there are calls to 0x07f... which are in the "0x3FFF filled" part of the ROM, 0x3FFF is "call @0x7ff", and at 0x7ff there's "izsn [0x44]" which doesn't make a lot of sense (?)
- there are some opcodes we don't know about, such as 0x0006 or 0x0007
- the code flow doesn't make sense in general in some cases, for instance the "GOTO FPPA0 instruction" at 0x0001 (I'm reading from the datasheet of the PMS154C) is "goto @0xc", at that address there's some code that then calls @0x7f6 which doesn't make sense, shouldn't the user program start?

I can however locate the code I wrote. The void func(void) is at 0x0002 till 0x000b and the FPPA0 is at 0x0086 (there's a goto going there at 0x0013).

This is the code I compiled from the IDE

Code: [Select]

#include	"extern.h"

void func(void)
{
	byte x = 5;
	byte y = 6;
	byte c;

	c = x + y - ~x;
}

void	FPPA0 (void)
{
	.ADJUST_IC	SYSCLK=IHRC/2		//	SYSCLK=IHRC/2

	func();

	byte c = 0;

	while (1)
	{
		c++;
	}
}

I attacched the disassembled ROM if someone wants to take a look.

Has someone figured out this stuff yet? I guess there's most likely something wrong with the opcodes RE, but IDK, I still have to take a look. I'm posting so that anyone can check if interested.

edit: just realized the 0x0006 and 0x0007 where the undocumented instructions js_12345678_55AA found https://www.eevblog.com/forum/blog/eevblog-1144-padauk-programmer-reverse-engineering/msg1970672/#msg1970672 , makes sense

js_12345678_55AA · « **Reply #87 on:** November 19, 2018, 09:12:02 pm »

Hi,

@gdelazzari

Read about how IDE / WRITER inserts the "rolling code" (is in help / manual of IDE).

For 154 the example says to read the serial numer / rolling code by this:

Code: [Select]

	call _SYS(ADR.ROLL);
	call _SYS(ADR.ROLL)+1;
	call _SYS(ADR.ROLL)+2;

Writer will insert "RET 0xAB RET 0xCD RET =0xEF" at those addresses which are located at the end of the ROM.

One call I could not figure out is to (last instruction possible -1) 0x3FFE. Maybe Writer will insert another RET 0x12 there.

I need to wait for my writer to arrive so I can write and read back.

The very last ROM word contains the values to setup IHRC , Security Fuse, ... (everything you setup with .ADJUST_IC).

.ADJUST_IC also will put a lot of init code inside (even contains delay loops, in case you setup something for SYSCLK other than "DISABLED").

Have fun,

JS

gdelazzari · « **Reply #88 on:** November 19, 2018, 09:22:14 pm »

Thanks, this now makes more sense. By the way I have my disassembler implemented with an instruction table like the one I posted before which I believe is a pretty flexible approach, adding an assembler and an emulator based on that should be pretty easy. I'm sure you're almost there to a complete disassembler too (and maybe other tools). What approach did you use? I'm trying to understand if my code could be useful or not, I guess I'll wait for you to upload your disassembler to the GitHub org?

js_12345678_55AA · « **Reply #89 on:** November 19, 2018, 09:36:00 pm »

I try to build a cycle accurate simulator which is ultra fast and can synchronize to real time (even on STM32 :-)). I took some inspiration from already existing tiny fast 6502 simulators.

Right now I try to figure out (imagine) how the AC and OV flags are set. The user manual of the processors does not describe it in an understandable way for me.

Here is a part of the source of already finished code for some opcodes (eA holds A, ePC the PC, eF the FLAGs, eSP the SP)

Code: [Select]

  uint8_t  eF = CPUioGet(0x00);             //get flags from emulated IO port, mapped like in IO flags: (- - - - V A C Z)
  uint8_t  eSP = CPUioGet(0x02);            //get SP from emulated IO port
  uint16_t opcode = CPUcodeGet( ePC++ );    //fetch next opcode and advance PC
  eCurrentCycle++;                          //increment current cycle counter

  if( 0x3FFE == opcode ) //special opcode ?
  {
    //TODO... find out what it does. IDE inserts when using ADJUST_CHIP
  }
  else
  //14 bit opcodes 0x0000 - 0x00BF
  if( opcode<=0x00BF )
  {
    switch( opcode )
    {
      case 0x0000: break; //NOP

      ...
  }
  ...
  else
  //6 bit opcodes 0x02.. , 0x2800 - 0x2FFF
  if( (0x0200 == (opcode&0x3F00)) || (0x2800 == (opcode&0x3800)) )
  {
    switch( opcode & 0x3F00 )
    {
      case 0x0200: eA = opcode&0xFF; ePC=(((uint16_t)CPUmemGet(--eSP))<<8); ePC|=CPUmemGet(--eSP); break; //RET k
  
      case 0x2800: eA += opcode&0xFF; eF=(eA>255)<<1;eA&=0xFF;eF|=!eA; break; //ADD A,k //TODO: OV, AC
      case 0x2900: eA -= opcode&0xFF; eF=(eA>=0)<<1;eA&=0xFF;eF|=!eA; break; //SUB A,k //TODO: OV, AC

      case 0x2A00: //CEQSN A,k
      case 0x2B00: //CNEQSN A,k
        T = eA-(opcode&0xFF);            //TODO: A-k or k-A ?
        if( ((0x2A00==(opcode&0x3F00)) && !T) || 
            ((0x2B00==(opcode&0x3F00)) && T) )
        {
          ePC++; eCurrentCycle++;
        }
        eF=(T>255)<<1;T&=0xFF;eF|=!T; //TODO: OV,AC (based on T)
        break;

      case 0x2C00: eA &= opcode&0xFF; eF&=~1;eF|=!eA; break; //AND A,k
      case 0x2D00: eA |= opcode&0xFF; eF&=~1;eF|=!eA; break; //OR A,k
      case 0x2E00: eA ^= opcode&0xFF; eF&=~1;eF|=!eA; break; //XOR A,k
      case 0x2F00: eA  = opcode&0xFF; break; //MOV A,k
    }
  }                
  else
  //5 bit opcodes 0x0400 - 0x0500, 0x1800 - 0x27FF
  if( (0x0400 == (opcode&0x3E00)) || ((opcode>=0x1800) && (opcode<=0x27FF)) )
  {
    uint8_t bit = 1<<((opcode>>6)&7);
    uint8_t addr = opcode&0x3F;
    switch( opcode & 0x3E00 )
    {
      case 0x0400: T=CPUioGet(addr);CPUioPut(addr,eF&2?T|bit:T&~bit); eF&=~2;eF|=(T&bit)?2:0; break; //SWAPC IO.n
      case 0x1800: if( !(CPUioGet(addr)&bit) ) { ePC++; eCurrentCycle++; } break;                    //T0SN IO.n
      case 0x1A00: if( CPUioGet(addr)&bit ) { ePC++; eCurrentCycle++; } break;                       //T1SN IO.n
      case 0x1C00: CPUioPut(addr,CPUioGet(addr)&~bit); break;                                        //SET0 IO.n
      case 0x1E00: CPUioPut(addr,CPUioGet(addr)|bit); break;                                         //SET1 IO.n
      case 0x2000: if( !(CPUmemGet(addr)&bit) ) { ePC++; eCurrentCycle++; } break;                   //T0SN M.n
      case 0x2200: if( CPUmemGet(addr)&bit ) { ePC++; eCurrentCycle++; } break;                      //T1SN M.n
      case 0x2400: CPUmemPut(addr,CPUmemGet(addr)&~bit);break;                                       //SET0 M.n
      case 0x2600: CPUmemPut(addr,CPUmemGet(addr)|bit);break;                                        //SET1 M.n
    }
  }
  else
  //3 bit opcodes 0x3000 - 0x3FFF
  if( (0x3000 == (opcode&0x3000)) )
  {

    if( opcode & 0x0800 ) //CALL needs to put current PC on stack
    {
      CPUmemPut( eSP++, ePC & 0xFF ); //TODO: check if on stack is little endian
      CPUmemPut( eSP++, ePC>>8 );     //TODO: check if on stack is little endian
    }
    eCurrentCycle++;
    ePC = opcode & 0x07FF;
  }
  else
  {
    //unknown instruction
    CPUexceptionEmulation("Unknown instruction", opcode );
  }

  CPUioPut(0x02,eSP);  //store SP to emulated IO port
  CPUioPut(0x00,eF);   //store flags to emulated IO port

JS

david.given · « **Reply #90 on:** November 19, 2018, 09:40:18 pm »

Is there any possibility of there being an internal ROM with more code in it? That would explain the weird jumps and calls.

Do any of these things have SPI or I2C? Because if we ever figure out the programming protocol, and you could make one of these processors program another, you could incredibly cheaply build a hypercube cluster of the flash versions of these things, all bootstrapped from one processor at the corner attached to the outside world... it'd be useless, but fascinating. Sadly, without some sort of fast comms it's probably not worth it.

gdelazzari · « **Reply #91 on:** November 19, 2018, 09:54:00 pm »

If you are seeking for really fast emulation, why not a lookup table of function pointers? Keep in mind that, since it's an harvard architecture + OTP + no way to read raw data from ROM, it is feasible to pre-process the ROM code as you wish if that helps you encode the instructions in such a way that results in a faster interpretation, given you don't need the original content for other purposes other than interpreting the code.

For instance (just writing down what is passing through my mind right now, will need a lot of refinement, but the idea is this):

- map each instruction to an id (arbitrarly, whatever order you like) starting from 0 to the num of instr-1 so you can have a C array of pointers to functions that execute the instruction
- preprocess the ROM by "disassembling" the 14-bits or whatever-bits opcodes and representing them, for instance, as 3 bytes: one for the id, one for the first parameter and one for the second (if any)
- loop over the processed ROM and lookup the corresponding emulation function like instr_funcs[instr_id]();, place the two parameters in a couple of globals or just let the emulation functions pick them up relative to the PC (which is global ofc)
- the emulated function does it stuff and returns

not sure how to cleanly handle cycle-accurate though, I'll have to think about that.

This may be a bit slower than your fastest if (opcode ...) evaluation, but you gain a constant and predictable time for all instructions, which I think is preferrable especially if the code that runs a cycle of the emulated CPU is inside an interrupt handler which fires at <x> MHz. All of this is assuming you're not on Cortex-M7 i.e. there's no cache in between, that messes everything up obviously.

Another (reeeeeally cool) thing to do would be JIT, I have no idea if that could be feasible given how different the architectures are, and of course you would lock yourself to ARM Cortex, but that would be indeed really cool.

spth · « **Reply #92 on:** November 19, 2018, 10:12:10 pm »

Quote from: david.given on November 19, 2018, 09:40:18 pm

Is there any possibility of there being an internal ROM with more code in it? That would explain the weird jumps and calls.

Do any of these things have SPI or I2C? Because if we ever figure out the programming protocol, and you could make one of these processors program another, you could incredibly cheaply build a hypercube cluster of the flash versions of these things, all bootstrapped from one processor at the corner attached to the outside world... it'd be useless, but fascinating. Sadly, without some sort of fast comms it's probably not worth it.

The is a reserevd are of, depeding on the device 8 to 32 words at the end of the ROM. Not a place to hide a lot of code, but, as already explained (and stated in the documentation) for checksums, rolling code, oscillator calibration data, code options. Since the 13-bit and 14-bit instruction sets do not have the ltabh and ltabl instructions, on those devies the rollign code is implemented as ret k instruction. You jump into the code, immediately jump back, and get the data byte in the accumulator.

The devices do not have I²C or SPI. But it can easily be emulated in software. That's actually one of the good things about the multicore design: You let one core handle some I/O protocol, and another core handles your normal program. IMO, this can be a good alternative to hardware peripherals. It makes the hardware simpler, cheaper, and all the gates can be used for the tasks at hand, instead of many of them sitting idle.

Philipp

spth · « **Reply #93 on:** November 19, 2018, 10:14:24 pm »

Quote from: DDunfield on November 19, 2018, 08:27:23 pm

Perhaps more relevant to these discussions is my past-life of creating development tools for little processors (mostly 8-bit). C compilers, assemblers, debuggers, simulators etc.
[…]
** Mostly retired now, but always up for an interesting project!

Would you be interested in contributing to SDCC, a free C compiler targeting small devices? There is a bit of a lack of developers and developer time recently. SDCC is still progressing, but some backends have fallen a bit behind (and of course there is work to do in the frontend, too).

Philipp

david.given · « **Reply #94 on:** November 19, 2018, 10:36:27 pm »

Wait, are you the Dunfield C Dunfield?

I think way back when when I was looking for a portable C compiler I kept running into your C page and thinking it looked ideal and then cursing its closed source-ness... then Amsterdam Compiler Kit got open sourced, and it was so awful to build I begged a copy of the source repository from Ceriel Jacobs, uploaded it to SourceForge, and became the maintainer for it. (I think I now have the oldest genuine timestamps on github.)

With modern eyes I see now that Micro-C is K&R, so it wouldn't have been any good to me back then, but it's still amazingly small. If you're ever thinking of releasing the source I'd be fascinated to see how it works. (Cowgol's a lot larger, and I needed to split it into eight different compiler passes to make it run on a 6502...)

spth · « **Reply #95 on:** November 19, 2018, 10:56:31 pm »

Quote from: ataradov on November 06, 2018, 04:25:55 pm

Quote from: TK on November 06, 2018, 03:31:03 pm
Padauk is not making any money selling the programmer, so why don't ask them for the programming protocol?
The have been asked and they refused to provide any information.

It could make sense to reverese-engineer the protocol for the Flash-based devices first:

1) These devices are probably more interesting to us than the OTP ones
2) The protocol is simpler (less voltages, max 8V on ICVPP)
3) The pins used are documented with signal names (VDD, GND, ICVPP, ICPDA, ICPCK)

Philipp

david.given · « **Reply #96 on:** November 19, 2018, 11:49:52 pm »

I don't know if the protocol's the same, but ICPCK/ICPDA are the same names that Holtek use for their line of ludicrously cheap microcontrollers.

http://www.holtek.com/documents/10179/106680/Holtek_Flash_MCU_Quick_Start_Guide_V100_en.pdf

DDunfield · « **Reply #97 on:** November 20, 2018, 03:34:45 pm »

Quote from: spth on November 19, 2018, 10:14:24 pm

Would you be interested in contributing to SDCC, a free C compiler targeting small devices?

Probably not, I've not seen much open-source code I'd be happy working on... but that's a different discussion.

Quote from: david.given on November 19, 2018, 10:36:27 pm

Wait, are you the Dunfield C Dunfield?

Mea culpa!

Regarding the source, I've not made it publicly available, but I've been known to give it out to interested parties.

For things not pertaining to the Padauk discussion, please contact me off list. Either through the messaging here (I'll try to check it at least every couple of days), or to my email at:

my-first-name{dot}my-last-name{@}gmail{dot}com

Regarding simulation:

With modern tools, "switch" can be as efficient as a function call table, and you can avoid the overhead of functions entry/exit on each instruction (which can be several instructions to establish and release a stack frame).

I find the most difficult part doing the flags... C has a disadvantage of not having visibility to the processor flags, so when coding a simulator in C you need to work out the flag updates manually. This is usually more operations than performing the main operation of the simulated instruction.

Back in the day, when PC's were not a whole lot faster than the embedded systems we were emulating, I coded the simulation functions in assembly language, where I could reduce decoding to a very simple jump table, and access the flags set by the operation directly (which are usually at least similar to the setting of flags on the target processor.

My EMILY52 simulator could emulate an 8051/52 in real time on a Pentium1 (or even a decent 486). The simulation engine was coded in 8086 assembly and had an advantage that the 8051 flags were very similar to the 8086 flags. Instruction that didn't set flags would jump directly back to the next decode, instructions which set flags jumped to one of several routines which saved the flags in question (I think there were 3 or 4 different combinations of flags set by instructions) before heading on back to the next decode.

For instructions which used the flags, I simply loaded the 8086 flags from the saved flags before executing the instruction. When the flags were read as a byte, I used a lookup table to translate the saved 8086 flags into the corresponding 8051 flags positions.

8051 registers were only saved/restored to 8086 registers when the simulation stopped or started ... while running, they resided in 8086 registers and did not have to be loaded "per instruction".

Thing ran fast! - on more modern systems, it runs considerably faster than a hardware 805x ... It was so "streamlined" that the simulation engine had no "test for exit" .. just a tight instruction decode loop. When the user hit "STOP" I would plant a JMP instruction into the loop to break it which was then patched back by the exit handler.

Another technique I've used in the past to achieve high-speed simulation, which works well if:
1) You can live with variance in the speed the first time a block of code gets executed,
2) You don't support or accept higher overhead for self-modifying code

Is to have an array of JMP targets for every address (easy if you processor has 1024 words of code memory, harder if it has megs)... At the beginning, each JMP targets a routine which decodes the instruction and updates the JMP target to point to the handler for that instruction before launching it.

On subsequent passes the JMP branches directly to the correct instruction handler, and never even has to look at the opcode.

This is particularity effective if the opcodes are complex to decode words. For simple opcodes, a simple JMP table will be nearly as fast and much simpler to debug.

On a PC I'd probably use a simple JMP table for the PMS150, 13 bit = 8k words (32k bytes) and would give you one simple direct jump to the correct handler.

On the STM with 64K of code, 32K would be a bit much, I'd probably divide the opcodes into blocks based on which bits are used and decode with ANDs and a few switches.

But I haven't really looked into it that far yet...

Dave

js_12345678_55AA · « **Reply #98 on:** November 21, 2018, 09:11:58 pm »

Hi,

I disassembled and commented the code which the IDE inserts at the start of the OTP.

This code is used for calibration of the internal high speed RC oscillator.

It looks like WRITER is doing the following:

1. write the complete OTP and set the calibration value to 0xFF (actually a RET 0xFF)
2. reset the IC and wait for IC to execute the calibration measurement program (handshake and bit bang protocol using WRITER_CLK:PA.3 / WRITER_DAT_OUT:PA.5 / WRITER_DAT_IN:PA.6)
3. write the calibration value over the 0xFF value

=> This is a trick. The OTP is intial all '1'. Programing can change a '1' to a '0' ... but never back to one. So writing 0xFF leaves this cell all '1' and later when the real value is present it can be written there.

As you can see the (quite big) calibration code wastes some valuable OTP.
There is also a small programing mistake inside (near the end of the calibration)

. ==> @PADAUK: maybe you want to fix it :-) (the mistake is harmless since WRITER can just reset the IC in order to restart a calibration, again ==> @PADAUK: you can save some valuable space here by just removing instructions from 0x004F - 0x0052)
With a little bit of practice the code also could be stripped here and there. Something we can do in future


0x0000:   0x0070    WDRESET                     ;reset watchdog

0x0001:   0x2f00    MOV A, 0x00      
0x0002:   0x0182    MOV IO(0x02), A  ;SP        ;set SP to memory start (0)

0x0003:   0x3fed    CALL 0x7ED                  ;get IHRCR (value inserted from WRITER)
0x0004:   0x018b    MOV IO(0x0B), A  ;IHRCR     ;setup IHRCR

0x0005:   0x3fee    CALL 0x7EE
0x0006:   0x019a    MOV IO(0x1A), A             ;BGTR? (not in datasheet)

0x0007:   0x2f00    MOV A, 0x00                 ;setup low voltage detector (value inserted from IDE .CHIP)
0x0008:   0x019b    MOV IO(0x1B), A             ;MISC_LVR 4V/3V5/3V/2V75/2V5/1V8/2V2/2V

0x0009:   0x2f34    MOV A, 0x34      
0x000a:   0x0183    MOV IO(0x03), A  ;CLKMD     ;setup clock mode (value inserted from IDE .ADJUST_IC)

0x000b:   0x3ffe    CALL 0x7FE                  ;get stored calibration value (stored during programing of OTP)

;This is a nice trick. The OTP reads all as '1' when not programmed. Programing will change the relevant bits to '0'.
;The trick is that you can program the OTP multiple times. It is always possible to change '1' to '0' but never the other way around.
;So they store 0x02FF in code memory at position 0x7FE which translates to RET 0xFF. Later programer can change the 0xFF return value by
;overwriting it with the final value (0xFF still all bits '1'). 
;This means all the code follows is used for calibration during programing only (big waste of OTP)

0x000c:   0x2aff    CEQSN A, 0xFF               ;Check if is 0xFF (nothing written there?)
0x000d:   0x3054    GOTO 0x054                  ;Jump over calibration routine to user program

0x000e:   0x3fed    CALL 0x7ED                  ;get IHRCR (value inserted from programmer)
0x000f:   0x0b81    MOV [0x01], A               ;store it in memory @0x01

0x0010:   0x1f91    SET1 IO(0x11).6  ;PAC.6     ;configure PA.6 as output

;calibration routine
0x0011:   0x2f20    MOV A, 0x20      
0x0012:   0x0b80    MOV [0x00], A               ;store 0x20 in memory @0x00

0x0013:   0x1ad0    T1SN IO(0x10).3  ;PA.3      ;check for HIGH signal at PA.3   <-- check for handshake signal for WRITER
0x0014:   0x3013    GOTO 0x013                  ;wait until PA.3 is high

0x0015:   0x1f90    SET1 IO(0x10).6  ;PA.6      ;set PA.6 to HIGH                <-- send response to WRITER

0x0016:   0x0063    DZSN A                      ;1c big delay, underflows after first loop
0x0017:   0x3016    GOTO 0x016                  ;2c inner loop apx. 3*256 cycles = 768 cycles
0x0018:   0x1180    DZSN [0x00]                 ;1c
0x0019:   0x3016    GOTO 0x016                  ;2c outer loop apx. 32*(768+3) = 24672 cycles, 
                                                ;-224 cycles from first inner loop (was init with 32 instead of 255) = 24448 cycles total delay

0x001a:   0x1d90    SET0 IO(0x10).6  ;PA.6      ;set PA.6 to LOW                 <-- stop sending response to WRITER

0x001b:   0x18d0    T0SN IO(0x10).3  ;PA.3      ;check for LOW signal at PA.3    <-- wait for WRITER to stop sending handshake signal
0x001c:   0x301b    GOTO 0x01B                  ;wait until PA.3 is low

0x001d:   0x2f01    MOV A, 0x01      
0x001e:   0x1950    T0SN IO(0x10).5  ;PA.5      ;if PA.5 is LOW ==> A=0x01 , HIGH ==> A=0xFF <-- WRITER sets ? ? ? value to 1/255 by setting PA.5
0x001f:   0x2fff    MOV A, 0xFF      

0x0020:   0x0c01    ADD A, [0x01]               ;add to IHRCR (value inserted from programmer) 1 or 255 (see above)
0x0021:   0x018b    MOV IO(0x0B), A  ;IHRCR     ;set new IHRCR
0x0022:   0x0b81    MOV [0x01], A               ;store new IHRCR value in memory @0x01

0x0023:   0x1ad0    T1SN IO(0x10).3  ;PA.3      ;check for HIGH signal at PA.3   <-- check for handshake signal for WRITER
0x0024:   0x3023    GOTO 0x023                  ;wait until PA.3 is high

0x0025:   0x1b50    T1SN IO(0x10).5  ;PA.5      ;check for HIGH signal at PA.5   <-- WRITER signals all done? 
0x0026:   0x304f    GOTO 0x04F                  ;jump to end of calibration code

0x0027:   0x2f04    MOV A, 0x04      
0x0028:   0x0188    MOV IO(0x08), A  ;MISC      ;disable low voltage detector

0x0029:   0x18d0    T0SN IO(0x10).3  ;PA.3      ;check for LOW signal at PA.3    <-- wait for WRITER to stop sending handshake signal
0x002a:   0x3029    GOTO 0x029                  ;wait until PA.3 is low

;start measurment
0x002b:   0x2f02    MOV A, 0x02
0x002c:   0x0182    MOV IO(0x02), A  ;SP        ;setup SP to memory @0x02

0x002d:   0x1304    CLEAR [0x04]                ;zero memory @0x04
0x002e:   0x1305    CLEAR [0x05]                ;zero memory @0x05
0x002f:   0x2f55    MOV A, 0x55
0x0030:   0x0b82    MOV [0x02], A               ;store 0x55 in memory @0x02
0x0031:   0x2f00    MOV A, 0x00      
0x0032:   0x0b83    MOV [0x03], A               ;store 0x00 in memory @0x03

;16 bit loop (0x0055) times some operations with internal value LDSPTL:LDSPTH (timing register?)
0x0033:   0x0006    LDSPTL                      ;?load from timing register L
0x0034:   0x0b04    XOR [0x04], A
0x0035:   0x0007    LDSPTH                      ;?load from timing register H
0x0036:   0x0805    ADD [0x05], A

0x0037:   0x1584    SL [0x04]                   ;rotate left 16 bit value
0x0038:   0x1685    SLC [0x05]
0x0039:   0x1004    ADDC [0x04]

0x003a:   0x1282    DECM [0x02]                 ;memory low byte -1
0x003b:   0x1083    SUBC [0x03]                 ;memory high byte -1 if carry set
0x003c:   0x1a40    T1SN IO(0x00).1  ;FLAG.CF   ;test for carry (0x0000 -1 => carry set)
0x003d:   0x3033    GOTO 0x033                  ;loop 

0x003e:   0x1f90    SET1 IO(0x10).6  ;PA.6      ;set PA.6 to HIGH                <-- send response to WRITER (measurement finished)


;send bit by bit the 16 bit measured value
0x003f:   0x1ad0    T1SN IO(0x10).3  ;PA.3      ;check for HIGH signal at PA.3   <-- check for handshake signal for WRITER
0x0040:   0x303f    GOTO 0x03F                  ;wait until PA.3 is high

0x0041:   0x1584    SL [0x04]                   ;16 bit shift left
0x0042:   0x1685    SLC [0x05]

0x0043:   0x0590    SWAPC IO(0x10).6 ;PA.6      ;set PA.6 according to carry (highest bit from 16 bit value before shift)

0x0044:   0x18d0    T0SN IO(0x10).3  ;PA.3      ;check for LOW signal at PA.3    <-- wait for WRITER to stop sending handshake signal
0x0045:   0x3044    GOTO 0x044                  ;wait until PA.3 is low

0x0046:   0x1950    T0SN IO(0x10).5  ;PA.5      ;check if PA.5 is LOW            <-- WRITER signals to send next bit
0x0047:   0x303f    GOTO 0x03F


0x0048:   0x1d90    SET0 IO(0x10).6  ;PA.6      ;set PA.6 to LOW

0x0049:   0x1ad0    T1SN IO(0x10).3  ;PA.3      ;check for HIGH signal at PA.3   <-- check for handshake signal for WRITER
0x004a:   0x3049    GOTO 0x049                  ;wait until PA.3 is high

0x004b:   0x18d0    T0SN IO(0x10).3  ;PA.3      ;check for LOW signal at PA.3    <-- wait for WRITER to stop sending handshake signal
0x004c:   0x304b    GOTO 0x04B                  ;wait until PA.3 is low

0x004d:   0x1b50    T1SN IO(0x10).5  ;PA.5      ;check for HIGH signal at PA.5   <-- WRITER signals to redo measurement 
0x004e:   0x302b    GOTO 0x02B                  ;if PA.5 is LOW do the measurement again

;here seems to be an error in the program from PADAUK, very unlikely PA.3 went hight during last 2 instructions, should be HIGH check for sure
0x004f:   0x18d0    T0SN IO(0x10).3  ;PA.3      ;check for LOW signal at PA.3
0x0050:   0x304f    GOTO 0x04F                  ;wait until PA.3 is low

0x0051:   0x1b50    T1SN IO(0x10).5  ;PA.5      ;check for HIGH signal at PA.5   <-- WRITER signals all done?
0x0052:   0x3011    GOTO 0x011                  ;if PA.5 is LOW do the complete calibration again (including all handshakes)

0x0053:   0x3053    GOTO 0x053                  ;calibration successful, endless loop (chip needs reset + writing of calibration value from WRITER)

;----------------------------------------------------------------------------------------------------------------------------------------------

0x0054:   0x018b    MOV IO(0x0B), A  ;IHRCR     ;normal program start, set IHRCR 

0x0055:   0x3055    GOTO 0x055                  ;the user program (was just a "while(1){}" loop)

Have fun,

JS

gdelazzari · « **Reply #99 on:** November 21, 2018, 09:37:49 pm »

Nice job js_12345678_55AA, really interesting. Will take an in depth look.

At the moment I also got the 13-bit ISA implemented in my infrastructure so I can disassemble PMS150 (C) ROMs. I noticed an interesting thing (I still have to look at the disasm). The PMS150 (without the C) has an order of magnitude less init code compared to the PMS150C (which seems to have code that more or less does the same stuff the 154C is doing).

I'm attaching the two listings if someone wants to take a look and start figuring it out.

(to discriminate what's "my" code and PADAUK's code, in both ROMs I only have the main which is simply incrementing a byte in memory, you'll find the inc [0x...]; goto @0x... as the last two instructions in both listings, that are the only two things not part of the init code)


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: EEVblog #1144 - Padauk Programmer Reverse Engineering (Read 470902 times)

Share me