Author Topic: STM32, ghetto style (Read 156650 times)

dannyf · « **Reply #75 on:** August 23, 2014, 04:41:11 pm »

I don't think it is possible.

On the flip side, you may be able to get cosmic (or sdcc) to work with other IDEs (Eclipse or CB, for example). I never tried it myself but shouldn't be that difficult.

dannyf · « **Reply #76 on:** August 23, 2014, 08:50:23 pm »

Quote

The STM8 chips are incredibly inexpensive - I have a few of them on a boat from China at 30 cents apiece (STM8S003F).

I just received them today. Solder 4 of them to a TSSOP28 adapter board, and connected to my stlink clone -> wola! LED blinked, no exception. For a 30cent mcu, not a bad deal at all.

A word of caution: STM8S003F requires an external capacitor on the Vcap pin to get it going. Without it, the mcu doesn't run and doesn't talk to stlink.

The datasheet has requirements for the capacitor but I have used 4.7n poly, .1u ceramic, up to 470u elctrolytic on loooooong wires. All worked like a charm.

Kjelt · « **Reply #77 on:** August 23, 2014, 09:20:51 pm »

Quote from: dannyf on August 23, 2014, 04:41:11 pm

On the flip side, you may be able to get cosmic (or sdcc) to work with other IDEs (Eclipse or CB, for example). I never tried it myself but shouldn't be that difficult.

To build should be possible but to debug requires a dedicated plugin interfacing the resonance rlink or stlink, never heard those existed for eclipse?

dannyf · « **Reply #78 on:** August 23, 2014, 10:38:20 pm »

Quote

never heard those existed for eclipse?

No. But I am not that close to open-source universe so it is entirely possible (and likely) that I am wrong here.

mrflibble · « **Reply #79 on:** August 24, 2014, 11:32:31 am »

Quote from: dannyf on August 21, 2014, 12:43:42 am

Looks like someone beat me to it: there are quite a few such boards on ebay - stm8s003f, voltage regulator, reset switch, isp header, a couple leds and 0.1" dip pin out, plus unfitted crystal.

All for less than $2 shipped. and seems to be fairly popular.

Oooh, nice find. Thanks for the tip.

dannyf · « **Reply #80 on:** August 26, 2014, 12:24:29 am »

Quote

0.9ma @ 2Mhz,

From the datasheet, the current consumption is 0.84ma typical, 1.05ma max;

Quote

3.9ma @ 16Mhz.

From the datasheet, the current consumption is 3.7ma typical, 4.5ma max.

dannyf · « **Reply #81 on:** August 26, 2014, 11:28:01 pm »

Highly unscientific testing of Keil mdk vs. gcc:

I have a simple blinky - similar to the one posted above but in modular form, in a project with the st standard peripheral library + my middleware (meaning that most of the code is unused).

1) Keil mdk:

with no optimization, and no discard of unused code, the output is 13KB.
with the most aggressive optimization (plus microlib) and discard unused code, the output is 1.6KB;

2) gcc-arm:

with no optimization, the output is over 37KB -> wouldn't fit my chip (STM32F030F);
with the most aggressive optimization and discard unused code, the output is 2.1KB.

amend:

3) IAR:

with no optimization, the output is 3KB - I think it must have cut unused code;
with the most aggressive optimization and multi-file compilation, the output is 1KB.

sporadic · « **Reply #82 on:** August 27, 2014, 05:13:49 pm »

Quote from: dannyf on August 26, 2014, 11:28:01 pm

Highly unscientific testing of Keil mdk vs. gcc:

I have a simple blinky - similar to the one posted above but in modular form, in a project with the st standard peripheral library + my middleware (meaning that most of the code is unused).

1) Keil mdk:

with no optimization, and no discard of unused code, the output is 13KB.
with the most aggressive optimization (plus microlib) and discard unused code, the output is 1.6KB;

2) gcc-arm:

with no optimization, the output is over 37KB -> wouldn't fit my chip (STM32F030F);
with the most aggressive optimization and discard unused code, the output is 2.1KB.

amend:

3) IAR:

with no optimization, the output is 3KB - I think it must have cut unused code;
with the most aggressive optimization and multi-file compilation, the output is 1KB.

Nothing scientific, but was curious how Atmel's solutions fared out of the box. Here's a quick comparison against an Atmel SAM D20E14 (Cortex M0+) using GCC and ASF in Atmel Studio 6.2 (default configs). Just a pin toggle like you had above. Coming from AVRs, not used to seeing such high memory usage, especially data, for such basic stuff.

Code: [Select]

Debug config:
    Program Memory Usage  :    2316 bytes  14.1 % Full
    Data Memory Usage     :    584 bytes   28.5 % Full

Release config:
    Program Memory Usage  :    2204 bytes  13.5 % Full
    Data Memory Usage     :    584 bytes   28.5 % Full

Program:

Code: [Select]

#include <asf.h>

void configure_port_pins(void) {
    struct port_config config_port_pin;
    port_get_config_defaults(&config_port_pin);
    
    config_port_pin.direction = PORT_PIN_DIR_OUTPUT;
    port_pin_set_config(PIN_PA00, &config_port_pin);
}

int main (void) {
    system_init();
    delay_init();
    configure_port_pins();
    
    while(1) {
        port_pin_toggle_output_level(PIN_PA00);
        delay_ms(100);
    }
}

dannyf · « **Reply #83 on:** August 27, 2014, 06:20:07 pm »

Quote

such high memory usage, especially data, for such basic stuff.

Not too bad. Most of that stuff is the start-up code, clock mgmt, and interrupt table.

600 bytes are little bit too high - I typically get 100 - 200 bytes. But it could have included the zero-initiated data - depending on the compiler used / setting.

westfw · « **Reply #84 on:** August 28, 2014, 07:47:03 am »

Quote

not used to seeing such high memory usage

A lot of those vendor libraries look like they were developed when the smallest chip had 64k+ of flash, so coming up with space-optimized init libraries wasn't at all important. If you really want to use those sub-$1 8-bit replacing ARMs with 16k or less, you may end up doing something else.

For additional comparison, Arduino Due runs about 10k, Teensy 3 about 12k, and Energia (for TI CM4) to about 2.5k (!) A pruned bare-metal Keil/microlib is down to about 300 bytes, some of which is unused code that I can't figure out how to get rid of :-( (That includes the CM4 internally-sourced interrupt vectors and dummy ISRs (up through systick), but not the ones for "external" interrupts.)

dannyf · « **Reply #85 on:** August 28, 2014, 10:46:53 am »

I looked at a particular piece of code I have in mdk - a blinky via rtc. Here is a skinny down version of its memory allocation:

1) 44 bytes for gpio related operations;
2) 100 bytes for main();
3) 74 bytes for rtc;
4) 230 bytes of flash and 1536 bytes of ram (for stack + heap) for startup;
5) 112 bytes for stm32f0xx_gpio
6) 172 bytes for stm32f0xx_rcc
7) 492 bytes for stm32f0xx_rtc

24 bytes for stm32f0xx_pwr
9) 356 bytes for system_stm32f0xx
10) 620 bytes for the various libraries linked into the code

So a basic blinky should take 230 + 356 + 112 + 172 + 44 + 100 = 1kb of flash minimum. plus 600kb of library + 1.5kb of ram.

This is fairly consistent with the numbers reported earlier.

It also suggests that 4KB of flash is probably the bare minimum, and 8KB is the practical minimum for those chips.

dgtl · « **Reply #86 on:** August 28, 2014, 04:13:14 pm »

The ST lib is exceptionally bad in code size and thus also speed.
For example, take HAL_RCC_OscConfig from STM32Cube. This is a 315-line super-function, that takes a pointer to a large struct. The struct contains a bit-mask to select which osc to configure (HSE/HSI/LSE/LSI/PLL). Then the struct contains config variables for all of the oscillators. The function checks those bits in the bitmask and splits to 5 conditional branches. As a project usually does not use all of the oscillators, the user is left with lots of dead code in the conditional paths of this function, that never get executed. In addition to that, the configuration struct contains configuration variables, that are never read. Usually this function is used with a constant input once at the beginning of the code, so depending on the input a lot of dead code could be avoided. For a compiler to clean up this mess, link-time optimization is required. Even then, the code makes it especially difficult for the compiler to find the dead code and variables and elliminate them. Why isn't this thing split to separate functions, one for each osc? It would be much better not to introduce dead code at all, not hope for some compiler magic to clean up the mess (unless you are trying to sell a specific compiler that happens to do that?). This is OK for a generic-use library, but not for a embedded system.
Another example is HAL_GPIO_WritePin and similar functions in .c files. In case the target state is constant/known (and not provided in a variable dynamically), the task to do is to write to a 32-bit register a 32-bit value. In the STM library, you get a function that takes a pointer to GPIO reg base; bit mask that may have only one bit set and boolean. It is in another object file, so the compiler can not inline and optimize it. Again, unless link-time optimization is used, the simple worst case of 4 bytes register address, 4 bytes register value (or we may even use shorter values) and some bytes to perform the write takes now one more constant and many more cycles to execute.

So, the ST libraries just waste the resources. It is quite easy for non-beginners to write your own code and save huge amounts of code space and execution time. The waste of resources is in this case not an issue of ARM cpu's or 32-bit uCs in generic but just a bad implementation of the library.
One may argue that writing your own code takes a lot more time. When adding up time learning the non-obvious APIs, time debugging some issues that those APIs cause (why does the re-configuration of sysclock source from to HSE+PLL need to magically set up periodic systick timer interrupt that was not used? did I ask it to?) and debugging the performance issues later on, it can be wise to avoid bad parts of such libs and use your own.

sporadic · « **Reply #87 on:** August 28, 2014, 06:12:54 pm »

Quote from: dannyf on August 27, 2014, 06:20:07 pm

Quote
such high memory usage, especially data, for such basic stuff.

Not too bad. Most of that stuff is the start-up code, clock mgmt, and interrupt table.

600 bytes are little bit too high - I typically get 100 - 200 bytes. But it could have included the zero-initiated data - depending on the compiler used / setting.

Quote from: westfw on August 28, 2014, 07:47:03 am

Quote
not used to seeing such high memory usage
A lot of those vendor libraries look like they were developed when the smallest chip had 64k+ of flash, so coming up with space-optimized init libraries wasn't at all important. If you really want to use those sub-$1 8-bit replacing ARMs with 16k or less, you may end up doing something else.

For additional comparison, Arduino Due runs about 10k, Teensy 3 about 12k, and Energia (for TI CM4) to about 2.5k (!) A pruned bare-metal Keil/microlib is down to about 300 bytes, some of which is unused code that I can't figure out how to get rid of :-( (That includes the CM4 internally-sourced interrupt vectors and dummy ISRs (up through systick), but not the ones for "external" interrupts.)

Just another observation, the Xmega*E5 is available in 8k-32k FLASH / 1k-4k SRAM, whereas the D20E is 16k-256k FLASH / 2k-32k SRAM. Each part essentially being the latest 32tqfp of their 8bit Xmega family and 32bit Cortex M0+ family (Not counting D21 which brings USB). Peripheral wise, I'd consider the D20E as a replacement for the Xmega*D4, but that's a 44tqfp package. Bottom line, even though they require more resources, those resources are given to you. Appreciate all the tests you're doing. Good things to consider when trying to choose a platform.

westfw · « **Reply #88 on:** August 28, 2014, 10:22:18 pm »

Quote

It also suggests that 4KB of flash is probably the bare minimum

Meh. Only if you assume that there won't be improved libraries and compiler options.
Like a lot of software, these are big and bloated mostly because no one has decided that it is important to do anything otherwise.
A company that cares about the difference in price between a 4k device and a 16k device can sure-as-hell spend some time fiddling with the compiler code, or write their code in assembler.

I have "blink" for Stellaris Launchpad in 68 bytes of "obvious" ARM assembler... (Yeah, that means the vector table ONLY has the stack pointer and reset vector. Getting one of those other non-maskable interrupts would probably be not good; perhaps as bad as the usual default infinite-loop ISR... :-))

Code: [Select]

; BLINK in ARM Assembler
; For TI Stellaris/Tiva Launchpad, with LED on PF1..3
; Aug 2014, by Bill Westfield - released to Public Domain.
;
Stack_Size      EQU     0x00000200

                AREA    STACK, NOINIT, READWRITE, ALIGN=3
Stack_Mem       SPACE   Stack_Size
__initial_sp

                PRESERVE8
                THUMB

; Vector Table Mapped to Address 0 at Reset
                AREA    RESET, DATA, READONLY
                EXPORT  __Vectors
__Vectors       DCD     __initial_sp              ; Top of Stack
                DCD     Reset_Handler             ; Reset Handler

; The program itself.
        AREA    |.text|, CODE, READONLY
newmain	PROC
Reset_Handler
		export Reset_Handler
		ldr r0, =0x400FE108  ; Sysctl_rcgc2_r
		ldr	r1, [r0] ;; old val
		orr	r1, r1, #0x20 ;; enable PORTF clk
		str r1, [r0]
		nop				; Wait for PORTF to get clocked.
		
initf	mov r1, #0xE  ;; output bits
		ldr r0, =0x40025000 ;; GPIO_PORTF
		str r1, [r0, #0x400] ; set bit DIR to output
		str r1, [r0, #0x51c] ; DEN  enable digital IO
		mov r2, #8           ; Bit 3: Green LED on Launchpad
loop	ldr r1, [r0, #0x3FC] ; read DATA reg
		eor r1, r2           ; complement bit
		str r1, [r0, #0x3FC] ; write

		mov r1, #(4*1024*1024) ;; Delay count
delay	subs r1, r1, #1			; decrement
		bne	delay
		b	loop
		ENDP
  END

dannyf · « **Reply #89 on:** August 28, 2014, 10:35:30 pm »

Quote

The ST lib is exceptionally bad in code size and thus also speed.

Code size and speed may not be the priority for library developers. I would argue that readability, and reliability are probably tops there, for such a chip typically with many KBs of flash.

Quote

STM32Cube

You may want to take a look at the standard peripheral libraries instead.

Koepi · « **Reply #90 on:** September 02, 2014, 10:32:57 am »

dannyf,

Thanks for this inspiration - I ordered a few STM32f030f4 end of July and some TSSOP20-2-DIP PCBs, but wasn't too sure if that could work at all. As I want stable operations, I added a few capacitors according to the datasheet - they don't cost much, the end result is still pretty close to 1 EUR per µC board.

I built three small PCBs now which work great!

Though I'm working with Eclipse and ST Standard Peripheral Lib, so my code looks a little bit different to yours:

Code: [Select]

#include "stm32f0xx.h"
#include <stdio.h>
#include "main.h"

void init(void) {
  GPIO_InitTypeDef GPIO_InitStructure;
  RCC_AHBPeriphClockCmd(RCC_AHBPeriph_GPIOA, ENABLE);

  // LED: Configure PA0 and PA1 in output pushpull mode
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_0 | GPIO_Pin_1;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
  GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
  GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
  GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
  GPIO_Init(GPIOA, &GPIO_InitStructure);
}

int main (void) {
  init();

  GPIOA->BRR = GPIO_Pin_0;  // Set PA0 to GND (LED on)

  while (1) {
    GPIOA->BSRR = GPIO_Pin_1;  // Set PA1 HIGH (LED on)
    Delay(500000L);
    GPIOA->BRR = GPIO_Pin_1;  // Set PA1 to GND (LED off)
    Delay(500000L);
  }
}

void Delay(__IO uint32_t nCount) {
  while(nCount--) {
  }
}

I added two tactile switches, one for pulling nRST down to GND, and another for connecting BOOT0 with nRST. That way it's much more comfortable to flash new iterations of the code to the µC via UART/USB-to-serial.

The µCs run great with more complex code, next to dimming instead of blinking the LED I added UART output, set PLL as SysCLK source and set it to 48 MHz, use the ADC, ... even using Standard Peripheral Lib that all fits still into a 5.6kByte binary.

As I can see you managed to flash the little buggers via ST-Link. I only have two ST-Link v2 USB-sticks and ST-Link v2 via STM-Discovery-boards. But I run into the issue, that connecting the board works (with Mac/Linux toolchain), I can read out all information parameters from the µC, and even erase the flash. But then something caled "flash loader" gets uploaded to SRAM and after that I run into a timeout. Thus I can only upload code via UART for now. In Windows, the ST utils only detect the supply voltage and show unknown processor.

Did you use something "unusual"? For example, ST-Link v1? How did you manage that?

dannyf · « **Reply #91 on:** September 02, 2014, 11:09:50 am »

Quote

I built three small PCBs now which work great!

Cause for celebration!

Nice job. I made one with proper pins - machined pins but the rest with just 22awg wires,

. I use jump wires to connect it to the programmer or external circuitry.

Quote

my code looks a little bit different to yours:

My code was really meant to be broken up into multiple modules to be integrated into a project. It was rewritten into one giant file for ease of compilation. Yours will work just fine.

Quote

Did you use something "unusual"?

I wasn't quite sure if I understood your description.

I burned the user code via two approaches:

1) uart (through ST Flash Loader Demonstrator): the connection there (via a usb-ttl converter) is RX/TX (pin 17/18 from memory), Vcc/GND, plus BOOT0 tied to nRST (on the chip). I never burned any bootloader onto the chip - the chip came with a bootloader in its system memory (ST term for rom).

2) stlink (a stlink v2 clone): the connection there is SWDIO, SWDCLK (pin 19/20 from memory), Vcc/GND, plus BOOT0 to GND. Again, all user code, no bootloader.

Hope it helps. If you can clarify your issue, I may be able to help more.

Koepi · « **Reply #92 on:** September 02, 2014, 11:26:40 am »

Thanks for the fast reply!

Sorry for not being clear about the issue in the first post already: Uploading a binary via SWD fails; everything else (uploading via UART, using periphery of the µC, ...) works like charme. Yes, the code shown in the first post works properly. No issues there.

The lengthy, explanatory intro was meant as possible addition to enhance the minimum board.

Quote from: dannyf on September 02, 2014, 11:09:50 am

2) stlink (a stlink v2 clone): the connection there is SWDIO, SWDCLK (pin 19/20 from memory), Vcc/GND, plus BOOT0 to GND. Again, all user code, no bootloader.

There is the difference already! I didn't pull BOOT0 to GND when trying to flash the µC via SWD. The internal pulldown of the pin should be active after start/reset anyways if I understand the datasheet correctly.
It didn't seem necessary to do, as the whole flash process works up to "flash erase". On all my board the SWD upload process fails after that, it is uploading some flash loader code to the SRAM and executes it, what obviously doesn't work as the process then times out. I already tried flashing an older firmware into the ST-Link clone (which made it work with a STM32F100-board magically, by the way.)

I will try a "hard connection to GND" via wire for BOOT0 tonight.
EDIT: Sorry, I will use a 10k resistor soldered to GND. So the buttons/switches will still work. Found some other reports that a floating BOOT0 pin may cause flash writes to fail.

It's not that it would really matter, but I'd like the small bugger to work properly "with everything" and not only with UART upload; this feels like it's broken, if you know what I mean

dannyf · « **Reply #93 on:** September 02, 2014, 01:21:16 pm »

The STM32F100 discovery board is a st link, I think. Yes, contrary to the claim made by ST, those little things work just fine as a regular ST Link on stm32 devices.

Quote

I will use a 10k resistor soldered to GND.

It should work. I used a jump wire,

. With a floating BOOT0, the target will not be detected by ST Link. Alternatively, you can try holding down the button while flashing it.

Once soldered to ground, however, you have to use strong pull-up to be able to run the bootloader. Connecting BOOT0 to nRST will no longer work as the weak pull-up on nRST isn't able to overcome the 10K pulldown resistor.

Koepi · « **Reply #94 on:** September 02, 2014, 01:27:26 pm »

Thanks again, dannyf!

Ok, as the space is very limited on those PCBs, I will first do the test and hold the "nRST->BOOT0" button and try SWD again.

Edit: I also made a small clip from the board "in action" (well - boring, just dimming the LED and spitting out info via UART):

Edit2: On the net there is the info that NRST has a 100kOhm pullup - which seems wrong. The datasheet itself (http://www.st.com/web/en/resource/technical/document/datasheet/DM00088500.pdf) says on page 58 typically 40kOhm, min 25kOhm, max 55kOhm internal pullup/pulldown for all pins, on page 62 30kOhm min, 40 typical and 50kOhm max for NRST. I will try a weaker pulldown for BOOT0, say 100kOhm or more.

bingo600 · « **Reply #95 on:** September 02, 2014, 03:03:56 pm »

I ordered 10 of the small ST's , but decided to "leave the Ghetto" , and get 10 nice PCB's for $7

http://www.aliexpress.com/item/10-pcs-Free-ship-stm32f030F4p6-system-board-learning-board-stm32-stm32f03-empty-plate/2030923590.html

They seemed easier to populate

A finished board on ali (to expensive) , but shows component placement.
http://www.aliexpress.com/item/stm32f030F4P6-stm32-Cortex-M0-system-board-learning-board-development-board-evaluation-board/1701304725.html

Hopefully i get the stuff during the next weeks.

/Bingo

dannyf · « **Reply #96 on:** September 02, 2014, 05:26:17 pm »

Quote

I will try a weaker pulldown for BOOT0, say 100kOhm or more.

That wouldn't be able to pull the BOOT0 down to reproduce a reliable '0'. 10K is the right number if you wish to implement a pull-down. I would try pressing the button for now,

Koepi · « **Reply #97 on:** September 02, 2014, 06:30:39 pm »

I even dared using a weak pulldown of 250kOhm

Success!
- µC starts reliably now; before I needed to plug in power some times until the µC executed the program.
- BOOT0 switch still works for flashing via UART.
- In Windows, ST-Util from ST now detects the µC properly. I connected RST from ST-Link clone to nRST of the µC board. Flashing a .hex worked. Debugging/stepping through the program/reading registers, all works!

Unfortunately, the texane/st-link still fails flashing the program, so I'm still stuck with UART flashing. But at least it works on the µC and seems just to be a software issue in Max OS X.

Soldered a 250kOhm pulldown to the other two boards now as well. It helps that the PCB is double sided, so it was easy to find some space.

Thanks for all those hints, dannyf. Now I'm really satisfied with the minimum DIY ARM microcontroller board

dannyf · « **Reply #98 on:** September 02, 2014, 08:01:00 pm »

Quote

Success!

A round of beers for everyone!

Quote

Unfortunately,...

If you had gotten ST-Link Utility to work, your SWD connections are good. Everything from this point onward is software, and debugging at source code level should work now.

Not knowing what "texane" is, I cannot be of much more for you.

neslekkim · « **Reply #99 on:** September 02, 2014, 08:49:10 pm »

Quote from: dannyf on September 02, 2014, 08:01:00 pm

Not knowing what "texane" is, I cannot be of much more for you.

Opensource stlink software: https://github.com/texane/stlink


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: STM32, ghetto style (Read 156650 times)

Share me