With this code:
while (1) {
GPIOD->ODR ^= GPIO_PIN_12;
}
I get 12 MHz, with my STMF4 discovery board, running at 168 MHz. Taking a look at the assembly code for it (Keil uVision), it looks like this:
|L8.246|
LDR r1,[r0,#0]
EOR r1,r1,#0x1000
STR r1,[r0,#0]
B |L8.246|
So it should run much faster, but the APB1 peripherals can only be clocked with 42 MHz max, so I guess it adds some wait states when accessing it, and might need even longer because of reading and writing to it.
Another trick for fast accessing individual GPIO pins, without loading first the value of the other 31 pins for the usual load/modify/write cycle, is using the bit-banding feature:
http://mightydevices.com/?p=144Note, this works even for the internal SRAM, which is pretty cool. I couldn't find the bitband macro for CubeMX, but it is easy to calculate the address yourself. This code here uses bitbanding for the pin toggling:
volatile void* address = &GPIOD->ODR;
uint32_t bit = 12;
uint32_t bb_base = 0x42000000;
uint32_t bb_ref = 0x40000000;
volatile uint32_t* pin = (volatile uint32_t*)(bb_base + ((uint32_t)address - bb_ref) * 32 + bit * 4);
while (1) {
*pin = 1;
*pin = 0;
}
It toggles the pin with 21.2 MHz, which is the fastest you can do for GPIO (the GPIO clock is 42 MHz). The assembly code looks like this (only showing the loop) :
|L8.244|
STR r2,[r0,#0]
STR r1,[r0,#0]
B |L8.244|
Of course, you could do this with the BSRR register as well, but bitbanding is very useful in general. You shouldn't write an integer counter to ODR without loading the old value first, and masking the bits you want to change, unless you are sure that it doesn't harm any of the other hardware that are connected to your pins.
BTW: If you need to output lots of data fast, and without gaps in the stream, and still want to do something in parallel with your CPU, you could try to use the SRAM controller with DMA transfer (if your STM32 chip has an SRAM model). Worked pretty well when I tried it, using two DIY 4 bit R2R DACs, despite the fact that the datasheet says that the circular mode (needed for gap-less transfers, to prepare the second buffer while the first buffer is transferred and vice-versa) is not allowed for memory to memory transfers:
https://plus.google.com/+FrankBussProgrammer/posts/jQidPgbtKoW