Personally, I prefer the Padauk architecture to 8051, but it is not that much of a difference. Both are much nicer than PIC. Both are far less nice than STM8.
I'm curious why you prefer Padauk over 8051? Also, why do you think STM8 is better than both (that hasn't necessarily been my experience).
In general, it seems like 8051 gets a bad rap lots of times and I'm not entirely sure why. Is it the split memory architecture? Or the lack of a good open source C compiler? Yes, I know about SDCC (and use it all the time), but it really isn't all that efficient/optimized in the code it generates. But, it is good enough for high level things, and one can always drop down to inline assembly for the places where speed and/or size efficiency is actually important.
Speaking of program size, 8051 is actually pretty optimal in this regard. There are a lot of 1 byte instructions in addition to the 2 byte instructions, and very few 3 byte instructions. The Padauk ICs use the equivalent of 2 byte instructions for everything (whether it is a 13/14/15/16 bit word). Other architectures AVR/STM8 seems to have larger sized instructions on average, potentially contributing to larger sized programs (of course it also depends on the instruction set and how many instructions are needed to accomplish the task at hand).
Surely the Padauk with their limited SRAM is no better than 8051 in terms of memory architecture. Even comparing 8051 to other architectures, I don't find the split memory to be all that big of a deal really. In fact, it is kind of liberating. The internal 128/256 bytes of SRAM can be thought of like a giant 'register' pool or scratch pad, and then one can use the larger xram for normal things like global variables that aren't accessed as often, or arrays where one has to access them indirectly anyway. The 8051's instructions really aren't that different for accessing xram than what other architectures require for indirect access (i.e. mov dptr, #address; movx a, @dptr;, or movx @dptr, a; inc dptr). This is very similar to AVR's X, Y, Z registers which are used as indirect pointers into SRAM. Most 8051 MCUs these days have support for dual dptrs as well. Maybe the biggest limitation is that the stack has to fit within the internal 256 byte of SRAM, but I haven't found that to be too much of a limit so far.
One place where 8051 might fall behind is in cycles per instruction, but this really depends on which variant you work with. One of my favorite 8051 MCUs right now is the Nuvoton N76E003 (~$0.20/each for a TSSOP20 IC with 18 IO's, 18KB flash (up to 4KB bootloader support), 256 bytes SRAM, 768 bytes XRAM, 12-bit ADC, 2xUART, SPI, I2C, etc...). It has instructions that vary from 1 to about 5 clock cycles, with the average being about 3 for most instructions. That is slightly inferior to the Padauk and AVR MCUs where most instructions are 1/2 clock cycles, although the N76E003 runs at 16MHz, and the Padauk ICs only run at 8MHz. But, there are other cheap 8051 MCUs that just as good. The CH551/CH552/CH554/CH559 MCUs are really efficient supporting 1 clock cycle for most instructions, with only a few in the 2+ range. And the CH551/CH552 are really inexpensive ($0.20 - $0.30) and even have USB support built in!
I agree that these Padauk MCUs are interesting and have their place, but I think they shine in a different area than 8051's and other MCUs. To me, the main benefit is the low cost, the lower power consumption, and the fact that they are good enough for a lot of simple things. But, the small SRAM/Flash size and lack of hardware peripherals certainly can be a limitation in many projects where spending a little bit more goes a long way.