Author Topic: EEVblog #1144 - Padauk Programmer Reverse Engineering (Read 454561 times)

tim_ · « **Reply #700 on:** September 14, 2019, 11:38:00 pm »

Quote from: oPossum on September 14, 2019, 07:19:23 pm

...

That's quite interesting. Is that your video?

So the die size of the PMS150 seems to be ~0.3mm². That is really small. For example an ATtiny4, which is Microchips lowest cost product, is ~1.25mm² (https://zeptobars.com/en/read/atmel-tiny4-attiny4-microcontroller). The ATtiny4 most likely uses a much older process, though.

The size of the OTP area of ther PMS150C is roughly 0.0433 mm², assuming that the picture shows the full area of the die. This is very close to the size of the 1k14 macro of Ememory's "Neobit" OTP memory: http://www.ememory.com.tw/html/products_green_neobit.php A coincidence? Maybe not, there are not too many independent vendors of suitable memory IP around...

This memory technology is specifically designed to be manufacturable in low complexity (low mask count) foundry processes. 0.3 mm² of silicon in a 15 mask count 0.18µm process costs less than 0.01 USD in volume*. The challenge would be to get the product developed, packaged, tested, yielded and distributed for fractions of a cent. If that is solved, then a 3 cent microcontroller does not look too outlandish.

---
*Average wafer price for 200mm on 0.18µm across the industry is $625: https://anysilicon.com/major-pure-play-foundries-revenue-per-wafer-2017-2018/

The usable area of a 200mm Wafer is approximately 30000mm², so there are ~100000 PMS150C per wafer. This would correspond to $0.006 per die.

oPossum · « **Reply #701 on:** September 14, 2019, 11:48:44 pm »

Quote from: tim_ on September 14, 2019, 11:38:00 pm

That's quite interesting. Is that your video?

No, not my video.

spth · « **Reply #702 on:** September 16, 2019, 08:23:45 am »

Quote from: tim_ on September 10, 2019, 06:49:36 am

I don't think true c-compatible paralellism was their ultimate goal. (They don't even provide a C-compiler...). Obviously that would have added a lot of additional complexity, as you point out.

Still, on such a "multicore" µC, you want efficient communication between multiple cores and interrupt handlers, which means efficient lockless atomics, which means a compare-and-swap instruction, preferably with indirect addressing mode.

Quote

The idea is probably to have specific tasks assigned to specific threads. For example one thread runs the SPI peripheral, a second one runs a control loop, while the main thread mostly sleeps and does housekeeping/reconfiguration. Each of these would reside in their own memory space with dedicated ressources and would use minimal inter-thread communication via pipes. This would keep synchronization overhead down a lot.

For SDCC, the main challenge would be to allow multiple main function to initialize each FPPA. Maybe that could be dont similar to interrupts? It would be up to the user to ensure that variables are not reused between different threads.

Of course, something needs to be done with the p-register. A very rigid approach would be to clearly assign each function to only one FPPA.

A more pragmatic approach, for now, would be to have C-code always stay on FPPA0 and use assembler for the others...

Regarding preemtive multitasking: Of course this would free up some of the synchronization headache, but then the usefulness would be quite limited compared to using multiple hardware threads.

SDCC is a C compiler, and will aim to comply with the C standard. Where that can't be done efficiently, we still need to be able to comply, but offers alternatives. E.g. when compiling for Padauk, functions are currently not reentrant by default (but reentrancy can be enabled for individual functions using __reentrant, or for the whole program using --stack-auto; it about triples code size though).

For multiple FPPA, my idea would be:

By default make the code work on any FPPA. That means a lock around any use of p. And no longer using p to pass return values (thus 16-bit return values would have to be passed on the stack, like we currently do for return values > 16 bit). Inefficient, but avoids nasty surprises.

Then provide a way for the user to specify that a function is to be used on a specific FPPA only, e.g. something like

Code: [Select]

[[sdcc::fppa(3)]] void f(int)
{
…
}

An FPPA3-specific p3 would be used instead of p when generating code for f.

tim_ · « **Reply #703 on:** September 19, 2019, 05:45:28 am »

Quote from: spth on September 16, 2019, 08:23:45 am

Still, on such a "multicore" µC, you want efficient communication between multiple cores and interrupt handlers, which means efficient lockless atomics, which means a compare-and-swap instruction, preferably with indirect addressing mode.

Well, this simply is not a universial multicore architecture. It is to be used within very narrow constraints. For example, if threads are used to emulate virtual periphery, it should be sufficient to work with very simple messaging schemes instead of comprehensive synchronisation. In most cases there will be a clear producer/consumer relationship and it will not be necessary to synchronize two or more threads with the same priority. There is also no real benefit from avoiding active waiting.

Btw, it should also be possible to use the atomic xch instruction? This is effectively like a CAS that can only compare to a single value.

Code: [Select]

  mov a,#1
loop:
   xch a,lock
   ceqsn a,#0
   goto loop:

[...critical block...]

  clear lock

Quote

SDCC is a C compiler, and will aim to comply with the C standard. Where that can't be done efficiently, we still need to be able to comply, but offers alternatives. E.g. when compiling for Padauk, functions are currently not reentrant by default (but reentrancy can be enabled for individual functions using __reentrant, or for the whole program using --stack-auto; it about triples code size though).

For multiple FPPA, my idea would be:

By default make the code work on any FPPA. That means a lock around any use of p. And no longer using p to pass return values (thus 16-bit return values would have to be passed on the stack, like we currently do for return values > 16 bit). Inefficient, but avoids nasty surprises.

Then provide a way for the user to specify that a function is to be used on a specific FPPA only, e.g. something like
Code: [Select]
[[sdcc::fppa(3)]] void f(int) { … }An FPPA3-specific p3 would be used instead of p when generating code for f.

This sounds like a good plan. Now we only need the flash variants with more than 1 FPPA to be released.

spth · « **Reply #704 on:** September 19, 2019, 06:46:12 pm »

Quote from: tim_ on September 19, 2019, 05:45:28 am

Btw, it should also be possible to use the atomic xch instruction? This is effectively like a CAS that can only compare to a single value.
…

No. xch is just a plain swap, no compare there. It still can be used to implement spinlocks, which are useful as building blocks for more advanced functionality (though for spinlocks, there is a sightly more efficient alternative using srl instead). If xch had an indirect adressing mode, it could also be used to implement the atomic_flag type (but it doesn't).

tim_ · « **Reply #705 on:** September 21, 2019, 05:32:27 pm »

FYI - some updates on my attempt on a toolchain:

https://github.com/cpldcpu/SimPad/tree/master/Toolchain

I implemented size-optimized softuart functions to be used as a serial debugger, including TX, string and number printing. They are less than 100 instructions in total and therefore should not interfere even with larger programs.

https://github.com/cpldcpu/SimPad/tree/master/Toolchain/examples/uartsend

Maybe I will integrate this into a minimized printf some time later. The printf provided in the standard library does not fit into the PFS154 code space right now, unless you force it to infer a "puts" by printing a string terminated with \n. I found this highly confusing.

Right now, the implementation has to be integrated by #including the c-code. I am not completely happy with that, but found no simpler way to avoid always including the binary. It seems that the linker for the PDK14 architecture will also link unused binaries?

Apart from that, the includes now support upper case SFR names as defines. This allows intellisense in VSCODE to work to at least some extend.

ali_asadzadeh · « **Reply #706 on:** September 22, 2019, 06:33:33 am »

Thumbs up

adding RX to it would be nice too

tim_ · « **Reply #707 on:** September 22, 2019, 06:54:06 am »

Quote from: ali_asadzadeh on September 22, 2019, 06:33:33 am

Thumbs up adding RX to it would be nice too

Well, there is just no straightforward way of doing that.

Option 1): Use no interrupt, RX routine has to wait until it receives a character.
Option 2): Use a pin change interrupt to detect RX, receive in interrupt handler.
Option 3): Thread on a separate FPPA

Option 1) means that you have to actively wait for RX in your mainloop. High risk of losing data, basically no time to do anythign else. Option 2) should be more stable. However it would occupy the one and only interrupt of your device. That's a bit intrusive for something that is to be used for debugging. Option 3) would be the way it is intended. However right now we don't have flash devices with 2 or more FPPA.

Well, I may work my way through all of these options at one point.

ali_asadzadeh · « **Reply #708 on:** September 22, 2019, 08:30:43 am »

I think option 2 is the best for single cores!

js_12345678_55AA · « **Reply #709 on:** September 22, 2019, 09:35:33 am »

Quote from: ali_asadzadeh on September 22, 2019, 06:33:33 am

Thumbs up adding RX to it would be nice too

I implemented UART RX/TX based on timer interrupt some time ago. I cleaned up and added the source to the examples folder in development branch:

https://github.com/free-pdk/easy-pdk-programmer-software/tree/development/Examples

Quote from: tim_ on September 21, 2019, 05:32:27 pm

Apart from that, the includes now support upper case SFR names as defines. This allows intellisense in VSCODE to work to at least some extend.

This is a a good reason to switch to all upper case in general for all IO registers. I will change my includes as well.

Have fun,

JS

js_12345678_55AA · « **Reply #710 on:** September 22, 2019, 05:08:43 pm »

Hi,

I got hold of 2 more PADAUK MCUs and added (partial) support for them in development branch:

PMS15A:
- this is just a marketing derivat of the good old PMS150C. It has exact same chip inside and reports as PMS150C.
- you can see that they are same in PADAUK IDE include folder: Compared to PMS150C, the include for PMS15A just has one suspicious extra line: ".Assembly User_Size 200h" which limits IDE code size...

MCU390: http://www.zhienchina.com/products/1556.html
- in older PADAUK IDE there was an include for MCU390 (now missing)
- on the "manufacturer" web site you can find interesting things like how to use the PADAUK IDE with libraries and the ICE with trace / uart / stimulus (one big archive containing everything): http://www.zhienchina.com/Upfiles/down/MCU39X%E8%A7%A6%E6%91%B8%E8%8A%AF%E7%89%87%E5%BC%80%E5%8F%91%E5%8C%85.rar

:-)

JS

tim_ · « **Reply #711 on:** September 22, 2019, 10:17:27 pm »

I have been experimenting with the IHRC and BGTR-calibration. As JS notedin an older post, there are calibration values stores on the flash in the PFS154/163 in two read-only memory addresses.

It appears to me, that these are automatically loaded into the respective registers. The dump below shows the initial value of IHRCR and BGTR and it is exactly the same as the value stored on the flash. The value is also the same that EASYPDKPROG ends up with during calibration. It appears that no calibration is needed on the flash types, unless you want to adjust to a different voltage?

Is this confirmed anywhere?

I also noted that IHRCR is readable, although it is marked as write only in the official .INC file.

Also strange: The low frequency oscillator calibration register (ILRCR) has the same initial value as the IHRCR. There is no corresponding value in the flash.

Code: [Select]

07E0: 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 0282 025A 1FFE
07F0: 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF
IHRCAL: 82
IHRCRNOW: 82
ILRCRNOW: 82
BGTCAL: 5A
BGTRNOW: 5A

from include file:

Code: [Select]

	IHRCR		IO_WO		0x0B
	ILRCR		IO_WO		0x39 (-)		//	[7:4]
	BGTR		IO_RW		0x1A (0x2C)		//	[7:3]

js_12345678_55AA · « **Reply #712 on:** September 22, 2019, 11:40:56 pm »

Quote from: tim_ on September 22, 2019, 10:17:27 pm

I have been experimenting with the IHRC and BGTR-calibration. As JS notedin an older post, there are calibration values stores on the flash in the PFS154/163 in two read-only memory addresses.

It appears to me, that these are automatically loaded into the respective registers. The dump below shows the initial value of IHRCR and BGTR and it is exactly the same as the value stored on the flash. The value is also the same that EASYPDKPROG ends up with during calibration. It appears that no calibration is needed on the flash types, unless you want to adjust to a different voltage?

Is this confirmed anywhere?

I also noted that IHRCR is readable, although it is marked as write only in the official .INC file.

Also strange: The low frequency oscillator calibration register (ILRCR) has the same initial value as the IHRCR. There is no corresponding value in the flash.

Code: [Select]
07E0: 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 0282 025A 1FFE 07F0: 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF 3FFF IHRCAL: 82 IHRCRNOW: 82 ILRCRNOW: 82 BGTCAL: 5A BGTRNOW: 5Afrom include file:
Code: [Select]
IHRCR IO_WO 0x0B ILRCR IO_WO 0x39 (-) // [7:4] BGTR IO_RW 0x1A (0x2C) // [7:3]

The factory values are encoded as "RET i" where i is the calibration value.

My observations showed that those values are not "auto loaded". However if you use Padauk IDE they always insert calls for factory BGTR tuning in startup code and maybe with some settings also for IHRCR.

IHRCR / ILRCR / BGTR are WRITE-ONLY registers. So not sure how / why it was possible for you to read them back? ... interesting?

Factory calibrated IHRC value is fully understood and documented (calibration value is for 5V / 16MHz):

The following macros can be used in SDCC:

#define PFS154_USE_FACTORY_IHRCR_16MHZ() { _ihrcr = *((const unsigned char*)(0x87ed)); }

#define PFS173_USE_FACTORY_IHRCR_16MHZ() { _ihrcr = *((const unsigned char*)(0x8bed)); }

BGTR is bandgap tuning. You need this when you want to use the comparator or ADC. It will tune the internal band gap to 1.2V (which can be used as reference for internal comparator or ADC)

I prepared some tests and will implement auto tuning for it in next release of programmer.

There are also factory calibrated values:

#define PFS154_USE_FACTORY_BGTR() { _bgtr = *((const unsigned char*)(0x87ee)); }

#define PFS173_USE_FACTORY_BGTR() { _bgtr = *((const unsigned char*)(0x8bee)); }

Have fun,

JS

tim_ · « **Reply #713 on:** September 23, 2019, 03:07:19 am »

I checked a bit further:
Edit2: scrap previous post

Interestingly, reading from IHRCAL, BGTCAL does not change the accu at all. This is why I was led to believe they would read the same values, because the accu still retained the value from reading the flash locations. Indeed, it is necessary to read the calibration manually.

A bit more compact macro:

#define PFS154_USE_FACTORY_TRIMMING() { __asm__ (".word (0x3fed)\nmov _ihrcr,a\n.word (0x3fee)\nmov _bgtr,a\n"); }

The bandgap is probably also used by the oscillator, unless it uses a separate voltage reference. It's a good idea to trim it, too. The resolution is rather fine though, around 500mV across the entire 8 bit range of BGTR, ~2mV per count. I tested this by using the comparator and Vint_ref as reference. It may be a challenge to measure the voltage accurately enough.

js_12345678_55AA · « **Reply #714 on:** September 23, 2019, 09:13:09 am »

Quote from: tim_ on September 23, 2019, 03:07:19 am

The bandgap is probably also used by the oscillator, unless it uses a separate voltage reference. It's a good idea to trim it, too. The resolution is rather fine though, around 500mV across the entire 8 bit range of BGTR, ~2mV per count. I tested this by using the comparator and Vint_ref as reference. It may be a challenge to measure the voltage accurately enough.

My plan is to use VDD and VBandgap as inputs for the comparator, send the comparator output to the dedicated output pin and capture this with programmer.
-> we can setup VDD with programmer an measure it with the programmer ADC very precise
-> by selecting a good "case" / "N" in GPCS / GPCR we should be able to do a simple and accurate trimming

I prepared a table for selecting a good value (see picture below).

The places with red VDD values are unusable for trimming. Unfortunately there is no 5.0 V possibility.

So we need to use "near requested VDD" values for trimming (e.g. 4.8V for 5.0V trimming).

I think "case 3", is suitable for all trimmings.

JS

illiseon · « **Reply #715 on:** September 23, 2019, 10:41:56 am »

hello js：
I do not understand how you use SPI to calibrate IC and measure the final frequency;Another thing I'm curious about is that there are direct ways to operate on OTP ram, such as burning programs directly into OTP ram and executing programs in ram.

tim_ · « **Reply #716 on:** September 23, 2019, 05:21:05 pm »

Apparently only bits [7:3] of BGTR are used. I determined Vbg to be at ~1.08V for BGTR=0 and 1.47V for BGTR=240 on one PFS154. This means that the voltage resolution of the trimming register is 8*(1.47V-1.08V)/240= ~ 13mV.

To reach a trimming accuracy that is better than 13mV it would be necessary to apply 4.8V+-26mV externally (13*4.8/1.2/2=26). This assumes perfect resistor matching, which is unlikely. Certainly not impossible but also not without a challenge. If this is not possible it may be better to use the factory trimming.

A more accurate solution is obviously to apply 1.2V+-6.5mV to an external input of the comparator and use that as a trimming reference.

js_12345678_55AA · « **Reply #717 on:** September 23, 2019, 06:47:13 pm »

Quote from: illiseon on September 23, 2019, 10:41:56 am

I do not understand how you use SPI to calibrate IC and measure the final frequency.

This is a simple trick: SPI is enabled on easy pdk programmer as slave. This means the clock is generated from IC. Now you just start a precision timer and use SPI to receive several thousand bytes (8 bit each) from IC.
After a specific amount of clocks (bytes) was received from SPI you check the timer value and now you know how many clocks in what time been sent form IC.
Like this we can measure very high frequencies (>10 MHz, since SPI is in hardware on the easy pdk programmer MCU) and we also can use the SPI transmit line to send single pulses to the IC to change the tuning value.

Quote from: illiseon on September 23, 2019, 10:41:56 am

Another thing I'm curious about is that there are direct ways to operate on OTP ram, such as burning programs directly into OTP ram and executing programs in ram.

The Padauk IC do not allow to execute from RAM. Also RAM is so tiny tiny small (64-256 byte) that you would not have the space to fit a lot of instructions inside.
However there are FLASH based Padauk ICs like PFS154 and PFS173 which can be erased and written many times (>1000 times). This are the ICs we usually use for development.

JS

js_12345678_55AA · « **Reply #718 on:** September 23, 2019, 06:52:57 pm »

Quote from: tim_ on September 23, 2019, 05:21:05 pm

Apparently only bits [7:3] of BGTR are used. I determined Vbg to be at ~1.08V for BGTR=0 and 1.47V for BGTR=240 on one PFS154. This means that the voltage resolution of the trimming register is 8*(1.47V-1.08V)/240= ~ 13mV.

To reach a trimming accuracy that is better than 13mV it would be necessary to apply 4.8V+-26mV externally (13*4.8/1.2/2=26). This assumes perfect resistor matching, which is unlikely. Certainly not impossible but also not without a challenge. If this is not possible it may be better to use the factory trimming.

A more accurate solution is obviously to apply 1.2V+-6.5mV to an external input of the comparator and use that as a trimming reference.

Hi,

I think you look to precise on this. In order to understand the magnitude of error our cheap ICs are having I suggest the following experiments:

1) output an alternating clock signal on an IO pin and measure frequency with oscilloscope
2) now touch the IC with your finger (which slightly increases temperature) and watch the HUGE difference of the clock

3) tune and perform your band gap comparator test (e.g. switch at 1.200V)
4) now touch the IC with your finger (which slightly increases temperature) and watch the HUGE difference of the band gap switch

...

BTW: I found factory trimming of bandgap on all FLASH and some OTP variants, however factory trimming of IHRC is available on FLASH based ICs only. ILRC does not have a factory trimming values in any of the devices I checked.

JS

ali_asadzadeh · « **Reply #719 on:** September 24, 2019, 08:50:16 am »

JS earlier you told you have a good sample on generating sound with these babies? is it ready? also please add more examples to the repo, like, ADC,I2C,SPI etc...

tim_ · « **Reply #720 on:** September 25, 2019, 04:49:09 pm »

Quote from: js_12345678_55AA on September 23, 2019, 06:52:57 pm

Hi,

I think you look to precise on this. In order to understand the magnitude of error our cheap ICs are having I suggest the following experiments:

1) output an alternating clock signal on an IO pin and measure frequency with oscilloscope
2) now touch the IC with your finger (which slightly increases temperature) and watch the HUGE difference of the clock

3) tune and perform your band gap comparator test (e.g. switch at 1.200V)
4) now touch the IC with your finger (which slightly increases temperature) and watch the HUGE difference of the band gap switch

BTW: I found factory trimming of bandgap on all FLASH and some OTP variants, however factory trimming of IHRC is available on FLASH based ICs only. ILRC does not have a factory trimming values in any of the devices I checked.

Well, the lower end Padauks probably don't have an LDO to save die space, which means the PSRR is not good. That could explain why they react to touch. At least according ot the datasheet, the temperature drift of the IHRC is not that bad? They cut corners everywhere to reduce die area. I notitced that also the souring capability of the GPIO is much worse than other MCUs. Probably a way to avoid large I/O transistors.

My point about the band gap trimming is that it will not be easy to do it better than the factory calibration. Trimming on the prgrammer would only improve if you want to trim the band gap to a value different from 1.2V.

tim_ · « **Reply #721 on:** September 29, 2019, 10:03:35 am »

I can only corroberate JS findings about very bad noise immunity of the PFS154.

I wasted a lof time getting a LED to work as a light sensor on the PFS154. My only conclusion is that the PFS154 has severe internal coupling issues. I summarized my findings here:

https://cpldcpu.wordpress.com/2019/09/28/a-led-candle-based-on-the-3-cent-mcu/

It seems to be advised to stick to purely digital designs, at least for the lower end Padauk MCUs.

tim_ · « **Reply #722 on:** September 29, 2019, 10:12:36 am »

Btw, here is an example to control WS2812 LEDs using SDCC:

https://github.com/cpldcpu/SimPad/tree/master/Toolchain/examples/WS2812_blinky

socram · « **Reply #723 on:** September 29, 2019, 01:58:03 pm »

One quick question: has been the read protection of the PFS154 already documented? I want to build a tiny device and sell it, and I'd like to protect it, but I've not seen how the "code options" such as LVR voltage, code protection, etc... is programmed in.

jhpadjustable · « **Reply #724 on:** September 30, 2019, 02:54:53 am »

Quote from: socram on September 29, 2019, 01:58:03 pm

One quick question: has been the read protection of the PFS154 already documented?

Found this with a quick search of the thread: https://www.eevblog.com/forum/blog/eevblog-1144-padauk-programmer-reverse-engineering/msg2305515/#msg2305515
Also see https://github.com/free-pdk/fppa-pdk-documentation/blob/master/Reserved_Area_Last_8_Words_Of_Codemem.txt


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: EEVblog #1144 - Padauk Programmer Reverse Engineering (Read 454561 times)

Share me