Author Topic: Best MCU for the lowest input capture interrupt latency (Read 21192 times)

jemangedeslolos · « **on:** March 30, 2022, 04:36:44 pm »

Hello,

I need to generate pulses triggered by a..... pulse.

I quickly test this by feeding a pulse into an input capture pin on a dsPIC33EP MCU @ 140Mhz ( without any signal conditionning ).
Inside the ISR, I generate a pulse and measure a little less that 600nS latency between the input and output rising edges, ( nothing in the while loop, all is inside the ISR )

I wonder if there is some tricks to reduce this latency. I don't have a goal to achieve, it is just by curiosity.
Do you know MCU much better in this area ?

Thank you very much

nctnico · « **Reply #1 on:** March 30, 2022, 04:49:44 pm »

There are MCUs with programmable state machine / timer units. For example NXP LPC1500 series. Then there is no software involvement at all.

tggzzz · « **Reply #2 on:** March 30, 2022, 04:51:43 pm »

Quote from: jemangedeslolos on March 30, 2022, 04:36:44 pm

Hello,

I need to generate pulses triggered by a..... pulse.

I quickly test this by feeding a pulse into an input capture pin on a dsPIC33EP MCU @ 140Mhz ( without any signal conditionning ).
Inside the ISR, I generate a pulse and measure a little less that 600nS latency between the input and output rising edges, ( nothing in the while loop, all is inside the ISR )

I wonder if there is some tricks to reduce this latency. I don't have a goal to achieve, it is just by curiosity.
Do you know MCU much better in this area ?

Thank you very much

The XMOS xCORE processors have a guaranteed by design processing time. None of the traditional execute then measure then hope you have captured the worst case rubbish

They can guarantee receiving and processing 100Mb/s ethernet bit streams in software, without a dedicated ethernet peripheral.

The IDE tells you how long it takes to get from "here" to "there". Latency is minimised: no caches, no interrrupts, just one of many cores sitting idle until it needs to do something.

However, often latency is less important than jitter.

The most rigourous technique is to avoid going through the processor altogether, either by external hardware/FPGAs or by imaginative use of peripherals in a specific processor.

DavidAlfa · « **Reply #3 on:** March 30, 2022, 04:57:55 pm »

You're missing a lot of details.
Compiler? Optimization level? Can you attach the ISR code?

600ns is ~84 instructions at 140MHz, so clearly your code is either poorly optimized or something is taking a lot of time in the ISR, ex. context saving/restoring.
If you do simple taks, avoiding calling other functions inside the ISR, it should save less data and thus provide a faster interrupt.

JPortici · « **Reply #4 on:** March 30, 2022, 05:25:41 pm »

Quote from: DavidAlfa on March 30, 2022, 04:57:55 pm

You're missing a lot of details.
Compiler? Optimization level? Can you attach the ISR code?

600ns is ~84 instructions at 140MHz, so clearly your code is either poorly optimized or something is taking a lot of time in the ISR, ex. context saving/restoring.
If you do simple taks, avoiding calling other functions inside the ISR, it should save less data and thus provide a faster interrupt.

42, as instructions take two clock cycles.
still too long.
I would expect in the order of 20 or less, as interrupts on dsPIC are fast.
(14 cycles after IF is set)

care to show some code?

DavidAlfa · « **Reply #5 on:** March 30, 2022, 05:31:02 pm »

2 cycles? Oh, I see it's different for dspic33E, I used dspic33F, they were were slower but did 1 instruction/clock.
I remember adding some attributes to the ISR giving much faster response.
Maybe it was this...

Code: [Select]

void __attribute__((interrupt, shadow)) _YourISR(void)But as far as I know, there's only one shadow register set available, so you must ensure there're no other interrupts being called on top of your ISR, that means, your ISR must have TOP priority.
Using Shadow, the context is automatically saved and restored in hardware, greatly reducing the latency.

SiliconWizard · « **Reply #6 on:** March 30, 2022, 05:32:22 pm »

Look for any MCU that has "peripheral trigger" capability. Quite a few out there these days.

Note that while this *usually* has lower latency than anything interrupt-based, the external trigger signal must still be synchronized internally, so you will still have a couple clock pulses of latency.

JPortici · « **Reply #7 on:** March 30, 2022, 05:33:53 pm »

Quote from: DavidAlfa on March 30, 2022, 05:31:02 pm

2 cycles? Oh, I see it's different for dspic33E, I used dspic33F, they were were slower but did 1 instruction/clock.

Yes, in dspic33c and dspic33e instructions are pipelined in two stages and flash is 48bit+8bit wide (two that instruction are fetched at the same time plus optional hamming code parity) with the obvious higher latencies on branches but overall higher throughput

Anyway the OP should either

-show some code (both interrupt and initialization of the peripheral, i don't remember if the IF flag is syncronized with FCY or with the IC clock, that would slow down things)
-state the minimum acceptable latency

in any case, i would probably use an hardware trigger, too.

T3sl4co1l · « **Reply #8 on:** March 30, 2022, 06:03:59 pm »

Pretty sure I could do that with the AVR-DA I've been playing with. It could be something like: input into event into TCD "fault", fault clear --> begin output pulses (delayed by, I think 1-3 clock cycles, at 24MHz max), TCD overflow event --> TCA count event, TCA overflow --> disable TCD or input event to terminate pulse train. And if the pulse train isn't too fast, TCD overflow interrupt could set new PWM/freq settings during it.

Assuming you mean pulses triggered by pulse.

Oh, if the input isn't a sustained level, then route it through the CCL (configurable custom logic) first, as a flip-flop say.

I think much of these peripherals have been in use, in some form or another, in other MCP products -- tinyAVRs have 'em, I think lots of PICs had CCL or something similar, maybe some of these more advanced timers I don't know, probably something that can be adapted.

As for raw interrupt latency, not sure about other systems, but AVR is fairly exemplar on this, but it's rare you get to use it at its fullest capacity, if ever. The interrupt itself occurs in a couple cycles (finish current instruction + jump into interrupt vector), but the IVT only has space for two words, traditionally a JMP .isr instruction. If you already have address and value loaded into dedicated registers, you could put a ST [X], r25; RETI or whatever in there, and that's it. If you guaranteed aren't using the interrupts immediately following it, you could write more code into the IVT. And, if compiled, the typical overhead goes something like: IVT JMP, preable (PUSH stack pointer, flags, registers), actual code, postable (POP everything), RETI. Which will give something like a fractional microsecond at the best of times; actually, not quite true, I've seen GCC emit the actual bare minimum when it's a copy between fixed memory locations -- no need to dirty multiple registers, save just the one and go.

Similar things apply to other platforms; ARM for example stores the return address in LR, which you'd better push to the stack before doing much else (and depending on when/how interrupts might be enabled during the current ISR, if that's optional or automatic or whatever?). Maybe some operations can be done entirely from register, maybe you need to push a whole bunch of them to do much of anything.

And yeah, because of pipelining, the interrupt itself costs at least that much; as a result, latencies tend not to go down much, even as you go up in frequency and power. Even if the CPU is that much faster, if all the IO operations have to propagate through bus interfaces and caches, it's still true. So like, buffered IO is mandatory for modern peripherals, that can accumulate fractional MBs in the time taken to empty their buffers. Fast response is best handled by hardware or configured logic, which, brings us back to the first point: now that MCUs are often integrating such logic, it's quite possible now that something even sub-cycle can be pulled off with them. But you'll have to go shopping, and read datasheets in detail, to figure out whether any particular thing is possible.

Tim

Sal Ammoniac · « **Reply #9 on:** March 30, 2022, 06:14:44 pm »

You can do this pretty damn fast in an FPGA, and you can have dozens or even hundreds (depending on how many I/O pins the FPGA has) of pulse inputs and outputs.

Kleinstein · « **Reply #10 on:** March 30, 2022, 06:28:27 pm »

Some hardware triggering is supported by some µCs. Most of the STM32 can do this and AFAIK also STM8...
The idea would be to use the external signal to trigger / start a timer that produces a pulse. AFAIK many PICs also have timers with a trigger.

For the interrupt latency the AVRs can be relatively good, when programmed in ASM, though the clock speed is limited. I remember code with an ISR call every 26 cylces and this still had some 10% of the processing power left for the main program.

A compiler tends to add quite some overhead to the ISR code, like an extra jump vector or more registers saved to the stack.

jpanhalt · « **Reply #11 on:** March 30, 2022, 08:14:35 pm »

An interrupt usually requires saving context. Of course, with some chips, you don't need to do that. Since "everything" is in the ISR, why have an ISR? Polling a single bit, even with a mid-range PIC at 32 MHz, takes 2 Tcy (250 ns) + a little housekeeping.

Sal Ammoniac · « **Reply #12 on:** March 30, 2022, 08:27:15 pm »

How about a PIC32MZ? Runs at 252 MHz and has register banks to reduce interrupt latency down to a minimum.

8goran8 · « **Reply #13 on:** March 30, 2022, 09:00:41 pm »

You can make a test using some 8-bit or 16-bit Microchip PIC with Core Independent Peripherals .

JPortici · « **Reply #14 on:** March 31, 2022, 03:06:31 am »

Quote from: Sal Ammoniac on March 30, 2022, 08:27:15 pm

How about a PIC32MZ? Runs at 252 MHz and has register banks to reduce interrupt latency down to a minimum.

besides the fact that unless you write the ISR in asm interrupt latency in MIPS is horrific compared to dsPIC...
peripherals run at 50MHz max and in those the interrupt flags generation latency is tied to the peripheral clock
in dsPIC33E peripherals run at 70MHz max, latency is lower also because there are register banks (at least the shadow bank)

tggzzz · « **Reply #15 on:** March 31, 2022, 09:58:33 am »

Quote from: jpanhalt on March 30, 2022, 08:14:35 pm

An interrupt usually requires saving context. Of course, with some chips, you don't need to do that. Since "everything" is in the ISR, why have an ISR? Polling a single bit, even with a mid-range PIC at 32 MHz, takes 2 Tcy (250 ns) + a little housekeeping.

The xCORE devices take that approach, but are better.

It can do that with all input ports simultaneously, using one core per port.

The waiting is done in hardware, and there are simple language constructs to achieve it.

You can also "multiplex" multiple ports, timers/timeouts, etc into a single core, and the hardware implements which branch of the "case statement" to execute.

Hardware records the clock cycle at which input occurs or output will occur.

Yes, effectively there is a simple RTOS in silicon.

Buy them from DigiKey

mon2 · « **Reply #16 on:** March 31, 2022, 11:31:48 am »

XCORE +1 ; there is a learning curve but may be a good fit; check if silicon is available; I do monitor that forum as well along with others to assist if you need help
FPGA +1 ; Gowin / Efinix Trion are low cost devices that will also fit your project; low cost kits available from both vendors

newbrain · « **Reply #17 on:** March 31, 2022, 12:06:00 pm »

Quote from: mon2 on March 31, 2022, 11:31:48 am

XCORE +1 ; there is a learning curve but may be a good fit; check if silicon is available; I do monitor that forum as well along with others to assist if you need help
FPGA +1 ; Gowin / Efinix Trion are low cost devices that will also fit your project; low cost kits available from both vendors

To have the best of both worlds (not really*, but worth checking) the PSoC5 family has a decent amount of FPGA-like programmable logic that can be used to generate a (train of) pulse(s) from a pin change event.
The programmable logic can be seamlessly connected with DMA and IRQ to provide higher level control and monitoring from the CPU.
Verilog is supported, if one does not want to use the IDE.

*There are drawbacks: Windows only IDE, not much support, high cost for the single part, cheap devkits but not currently available...

tggzzz · « **Reply #18 on:** March 31, 2022, 12:20:58 pm »

Quote from: mon2 on March 31, 2022, 11:31:48 am

XCORE +1 ; there is a learning curve but may be a good fit; check if silicon is available; I do monitor that forum as well along with others to assist if you need help

Agreed.

When I first "kicked they tyres", I was amazed at how fast I got things working. There were zero unexpected surprises; everything just "worked as it says on the tin". I've never had that with any other MCU.

Key points: the xCORE hardware and xC software have been designed so they work together seamlessly. Each complements the other.

The documentation is brief and accurate, and I didn't see any errata or caveats. Quite remarkable; I wish all devices were as easy!

The IO ports are nicely designed and have many FPGA attributes. Clock rate and width can be chosen independently of everything else, SERDES, various simple modes such as strobe, latch, etc. Can ignore the input until == or != an arbitrary value. Timers capture the clock cycle at which input occurs or output will occur, or a timeout occurs.

Simple language constructs capture all of those hardware capabilities, including wait until input from this port or that port or timeout or output completed.

Within a day of starting, I was able to get it to:

capture two serial input streams at 62.5Mb/s, guaranteeing by design not to miss any bits
count all the 0->1 transitions
do front panel operations
communicate with a PC over USB

That would have been impossible with other MCUs. At the very least I would have had to take many measurements to assess the performance, and hope that I had captured the worst case.

The XMOS devices have FPGA like attributes, and sit in a niche between conventional MCUs and FPGAs.

JPortici · « **Reply #19 on:** March 31, 2022, 12:30:20 pm »

without changing the MCU, if the OP would also reveal the full partnumber he could be aided in how to best use the peripherals available to reduce latency (PTG / OC trigger)

Maybe the ISR is not actually needed, or it's needed to control the length of the pulse, and provided the pulse is at least N* cycles long it could be done without much effort

*say 20 cpu cycles, it shouldn't take more to enter the ISR, move a word from ram and to the OC compare register and let it synchronize

DavidAlfa · « **Reply #20 on:** March 31, 2022, 12:48:56 pm »

I don't think he needs changing the whole system for this, PICs don't have a big interrupt latency.
It's the compiler context saving overhead what usually causes it, can be optimized by trying few things.
I recall having this issue years ago, though I can't remember the exact details, but I greatly reduced the ISR delay.
- Avoid calling other funtions in the ISR
- Using static or global volatiles instead the stack.
- By default, Shadow registers are not used as there's only one set!
- If timing it's so critical, set the ISR to highest priority and enable the shadow registers.

This should reduce the context saving to the minimum, but as I already said, ensuring no other interrupts happens in top of that one.

Code: [Select]

void __attribute__ (( interrupt, shadow, no_auto_psv)) yourISR(void)
Anyways, lately there're lots of threads with a lot of unnecessary feedback where the OP flew away.
So I'd avoid extending this discussion further until he gives signs of life, adding some code/details.

So many times we squeeze our brains, and when the code is finally shown, it's an ugly abomination, explaining all the issues...

tggzzz · « **Reply #21 on:** March 31, 2022, 01:38:53 pm »

The OP stated

Quote from: jemangedeslolos on March 30, 2022, 04:36:44 pm

I don't have a goal to achieve, it is just by curiosity.

and that's just fine.

I don't think specific information is required, but he is after general/universal techniques. Good for him!

HwAoRrDk · « **Reply #22 on:** March 31, 2022, 03:30:03 pm »

Quote from: Kleinstein on March 30, 2022, 06:28:27 pm

Some hardware triggering is supported by some µCs. Most of the STM32 can do this and AFAIK also STM8...
The idea would be to use the external signal to trigger / start a timer that produces a pulse.

Yes, STM8 can do this, but I have found that there is at least 3 cycles latency (of timer clock that is, not CPU) between the trigger and timer start/reset, due to synchronisation. This makes it not so useful for some applications.

Siwastaja · « **Reply #23 on:** March 31, 2022, 05:49:03 pm »

Peripherals may or may not have exactly what you need, depending on the specifics of your use case.

But, Cortex M4 and M7 MCUs running at some 200MHz are not expensive, and 2-3 cycles for synchronization, 12-cycle interrupt latency + maybe 3-4 cycles for the IO, you can do it in around 100ns. 500MHz Cortex M7 parts are a bit more expensive, but still only around $10 mark, and will make that down to 50ns. The advantage of the general purpose interrupt/software solution is independence of the peripheral resources. You can make the most critical IO operation as the very first thing in the ISR, and then keep doing something else, for complex cases.

If something even quicker is needed, in very simple cases a good old logic IC could do (say, a flip-flop, or a logic gate). If you need more than 2-3, then you would likely want to go for CPLD.

NorthGuy · « **Reply #24 on:** March 31, 2022, 09:01:47 pm »

Quote from: jemangedeslolos on March 30, 2022, 04:36:44 pm

I quickly test this by feeding a pulse into an input capture pin on a dsPIC33EP MCU @ 140Mhz ( without any signal conditionning ).
Inside the ISR, I generate a pulse and measure a little less that 600nS latency between the input and output rising edges, ( nothing in the while loop, all is inside the ISR )

If I remember correctly, interrupt latency is 10 instruction cycles, so at 70 MHz instruction clock (that's what you get with oscillator at 140 MHz), you should get 150 ns latency. You do not need any context save to drive a pin, so you can do it right upon ISR entry, giving you the response time under 200 ns. Of course, if you write in C there will be a lot of overhead, so you must write the ISR in assembler.

Newer dsPICs (CK and CH) which can run at 100 MHz instruction clock you can give you roughly 120 ns.

With FPGA, you can get the reaction time under 10 ns.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Best MCU for the lowest input capture interrupt latency (Read 21192 times)

Share me