Author Topic: Detecting missed deadlines in hard real time (Read 2438 times)

andyturk · « **on:** August 15, 2017, 02:43:46 pm »

I'm working on a project that involves collecting data at regular intervals, doing some calculations, and then taking other actions based on the results. Architecturally, it's pretty straightforward, but it does need to be fast--the faster the better.

I've basically got the entire CPU to work with (a beefy mcu in this case) and hardware timers to drive things. I'm not really worried about making the system "fast", but I'd also like it to measure its own response time if possible and throw an error if a deadline gets missed. No planes will fall from the sky, and nobody's going to die, if a USB interrupt sneaks in there and slows things for a moment, but I'd like to know if something like that is happening.

What tricks/techniques do you folks use for this kind of problem? Is it all just interrupt priorities and hardware timers?

For calibration, I'm looking at a deadline/loop time that's tens of microseconds on a Cortex M7 (sorry tgzzz, no XMOS here).

Kalvin · « **Reply #1 on:** August 15, 2017, 03:09:11 pm »

Use a hardware timer so that you will reset/start it when you start computing and stop it when you have done the computing. Thus, you will get the time used for the computation - including any interrupts and other stuff which may lengthen the computation time. If you have a GPIO line to spare, set it "1" at the start of the computation and reset it back to "0" at the end of the computation. Use a digital oscilloscope to measure the jitter and possible overruns by using persistent or very long display mode.

NorthGuy · « **Reply #2 on:** August 15, 2017, 03:45:56 pm »

You probably have some sort of timer interrupt to initiate the data collection/computation. Once you're about to start the acquisition, I would set a flag. This flag would mark that the processing takes place. Once the computation is done (presumably outside of the interrupt), I would clear the flag. At the beginning of the timer interrupt, I would check if the flag is still set. If it is, then you have an overrun.

If you place the flag on an outside pin, you can measure the duty cycle, which will give you an idea on how fast you can go. For example, if the duty cycle is 0.4 then you probably can double your speed. There must be some slack to accommodate "slow" processes. I usually calculate it as a percentage of the CPU needed for slow processes plus some margin. If I allotted 10% of the CPU time to slow processes and 5% for margin, I can increase the duty cycle up to 0.85 and it still will be safe.

nctnico · « **Reply #3 on:** August 15, 2017, 03:51:05 pm »

I use GPIO pins and an oscilloscope (or logic analyser) for these kind of measurements.

stmdude · « **Reply #4 on:** August 15, 2017, 04:09:04 pm »

Depending on your MCU of choice, you might have a "Window Watchdog", which may cover your needs.

Typically, they let you "measure" a piece of code, and tell it that it should bark if it takes more or less time than you expect. So, it's like a regular watchdog, but they have a minimum time as well.

Sal Ammoniac · « **Reply #5 on:** August 15, 2017, 05:01:28 pm »

We use the Cortex cycle timer that's implemented as part of the Data Watchpoint and Trace (DWT) unit on most Cortex-M3/4/7 processors. It's a simple 32-bit counter that increments on every processor clock cycle (except when the processor is stopped for debugging).

T3sl4co1l · « **Reply #6 on:** August 15, 2017, 07:48:50 pm »

I like to drive repetitive tasks by setting a periodic interrupt (heartbeat if you will), at whatever the system sample rate is. (If you need a particular sample rate for, say, DAQ applications, this is ideal. If you just need system monitoring and task switching sorts of things, the rate might be arbitrary, or variable even, say by sleeping on CPU idle time to save power.) Then, events are triggered by a division of that interrupt. This is done by keeping a static counter, incrementing it every interrupt, and if it's >= threshold, reset counter and fire the next stage.

So, you can have priority and real-time events that fire every time (a division of 1, it's just executed straightaway), lower priority or less frequent events (say, scanning for user I/O, or low baud rate ports, or acquiring low bandwidth inputs like temperature), and very infrequent things (like, uh, if you implemented an RTC in software, so every time seconds roll over into minutes, and so on).

And for priority, what you do is: if you can't ensure that every one of these functions, in the worst case scenario, will complete in one pass, before the next interrupt: enable interrupts, and let it fire again! The interrupt calls the high-rate tasks, and doesn't call the low-rate tasks because they're not up yet. Furthermore, you can add a static flag to each task, so that, if it is currently busy (i.e., that function is still on the stack), it won't re-enter that function.

For measuring execution time, you could add a few more static variables to each task, so when a task is started, a timestamp is recorded (which might be timer_counter_register with heartbeats_since_boot, so you get a microseconds-accurate measure -- cool!), and when it's finished, the difference against the current timestamp is recorded.

What you wouldn't be able to do, is task management: in principle, you could, kind of, pull a stalled (hasn't terminated in so-and-so cycles) task off the stack. Buuuut.... then you need to know ALL the stack memory that task used, at the point it was last interrupted. You could trace the stack, and reason your way back to where it was, then calculate what stack offset it started on (which could also be recorded as another static variable, for simplicity's sake). But, I mean, that's a huge mess. At that point, you would be better off using a full task switching architecture, where multiple stacks are used (hope you have lots of RAM!), all the registers (and IO state, if shared between tasks -- things can quickly get complicated!) are saved for each task, and tasks are queued in a suitable fashion (round robin, cooperative, preemptive, etc.). And at that point, you should hope to have a MMU as well, and a kernel to orchestrate all of this; and pretty soon you'll be wishing for a chip that can boot Linux, and let that be that.

And, regarding multitasking: allowing a task to be interrupted is key. Simply enabling interrupts allows this, but it's crude and uncontrolled. You could assign a priority flag per task, which is checked if it's executing and how much priority it has; or you could poll for waiting events, or a termination flag, within the task's main loop. Loops are especially important, because you might think a given section should never have any way to lock up, but if you aren't proving that statement, for every possible input, and checking bounds and doing basic input verification, or even just fuzzing the function to see if a fraction of its possible inputs can cause bogus operation -- who knows? You're setting yourself up for bugs, there!

Keep this in mind: say you implement the function on a PC. Suppose the function accepts, say, two WORD parameters. You can exhaustively search the entire parameter space in only 4 billion steps. That'll take maybe a few minutes or days on a desktop, for a lot of compact functions you'd use in an embedded project. And you can measure the execution time (well, grossly perhaps, if you don't have a high resolution counter like CPUID to use) and compare the function output to a table of precalculated values, or a mathematical expression, or something like that. Even if the table is several gigs long, that's not a huge burden, either! With in-depth testing being so cheap, writing truly iron-clad functions is easier than ever.

But yeah, tons of ways to go about things!

Tim

KL27x · « **Reply #7 on:** August 15, 2017, 09:29:43 pm »

+1 on using a scope. Aside from measuring/capturing jitter in its "natural habitat," you can screw around with your interrupt flags an/or have them directly call each other in test code to see what should happen on the worst-case blue moons. Or w/e ^ he said.

nctnico · « **Reply #8 on:** August 15, 2017, 10:21:42 pm »

When testing on a PC you can use tools like Valgrind to do runtime code analysis. Together with high execution speed you can test years of operation in a matter of hours. Assuming the test patterns you feed it cover all possibilities ofcourse.

What I regulary do for embedded firmare is to set a bunch of GPIO pins in a specific state. My logic analyser can make a histogram which shows how much time is spend in each state. This is very handy for profiling software which runs on an embedded target.

westfw · « **Reply #9 on:** August 16, 2017, 08:34:38 am »

On a running system, start a timer that will interrupt after your deadline has expired, and RESET the timer whenever you service things. No need to stop the timer, and no need to service an interrupt unless the timer actually expires. (pick a timer that can be "reset" inexpensively.) This should have particularly low overhead. (Of course, "extra" overhead shows up when you're already overloaded, so that's less than ideal...) DO try to avoid adding too much extra overhead when the system falls behind; printing a message for every missed deadline would be a TERRIBLE idea. (there was, at one time, an ICMP Source Quench message, that *in theory* a router or host could send to a traffic source if it was experiencing congestion. It was decided that sending a packet on a possibly-congested network from a possibly-overloaded router was a BAD IDEA, and it became quite dis-recommended! ( http://www.networksorcery.com/enp/protocol/icmp/msg4.htm ))

nctnico · « **Reply #10 on:** August 16, 2017, 11:13:22 am »

Whether printing a message is a good or bad idea depends on the situation. When debugging it can be helpfull to dump any relevant information to pinpoint the cause of the problem.

andyturk · « **Reply #11 on:** August 16, 2017, 02:36:50 pm »

I use an external GPIO too (when the EEs actually give me one to work with, that is). Using multiple pins with a histogram on the scope is neat! But that's mainly useful during debugging when you've got the board on your desk and can use a scope.

Having the Cortex DWT function as a cheap stopwatch is a great idea for building in some measurement tools. I also like the idea of setting up another post-deadline timer that interrupts when something took too long.

Great suggestions, folks. Thanks!


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Detecting missed deadlines in hard real time (Read 2438 times)

andyturk

Detecting missed deadlines in hard real time

Kalvin

Re: Detecting missed deadlines in hard real time

NorthGuy

Re: Detecting missed deadlines in hard real time

nctnico

Re: Detecting missed deadlines in hard real time

stmdude

Re: Detecting missed deadlines in hard real time

Sal Ammoniac

Re: Detecting missed deadlines in hard real time

T3sl4co1l

Re: Detecting missed deadlines in hard real time

KL27x

Re: Detecting missed deadlines in hard real time

nctnico

Re: Detecting missed deadlines in hard real time

westfw

Re: Detecting missed deadlines in hard real time

nctnico

Re: Detecting missed deadlines in hard real time

andyturk

Re: Detecting missed deadlines in hard real time

Share me