I like to drive repetitive tasks by setting a periodic interrupt (heartbeat if you will), at whatever the system sample rate is. (If you need a particular sample rate for, say, DAQ applications, this is ideal. If you just need system monitoring and task switching sorts of things, the rate might be arbitrary, or variable even, say by sleeping on CPU idle time to save power.) Then, events are triggered by a division of that interrupt. This is done by keeping a static counter, incrementing it every interrupt, and if it's >= threshold, reset counter and fire the next stage.
So, you can have priority and real-time events that fire every time (a division of 1, it's just executed straightaway), lower priority or less frequent events (say, scanning for user I/O, or low baud rate ports, or acquiring low bandwidth inputs like temperature), and very infrequent things (like, uh, if you implemented an RTC in software, so every time seconds roll over into minutes, and so on).
And for priority, what you do is: if you can't ensure that every one of these functions, in the worst case scenario, will complete in one pass, before the next interrupt: enable interrupts, and let it fire again! The interrupt calls the high-rate tasks, and doesn't call the low-rate tasks because they're not up yet. Furthermore, you can add a static flag to each task, so that, if it is currently busy (i.e., that function is still on the stack), it won't re-enter that function.
For measuring execution time, you could add a few more static variables to each task, so when a task is started, a timestamp is recorded (which might be timer_counter_register with heartbeats_since_boot, so you get a microseconds-accurate measure -- cool!), and when it's finished, the difference against the current timestamp is recorded.
What you wouldn't be able to do, is task management: in principle, you could, kind of, pull a stalled (hasn't terminated in so-and-so cycles) task off the stack. Buuuut.... then you need to know ALL the stack memory that task used, at the point it was last interrupted. You could trace the stack, and reason your way back to where it was, then calculate what stack offset it started on (which could also be recorded as another static variable, for simplicity's sake). But, I mean, that's a huge mess. At that point, you would be better off using a full task switching architecture, where multiple stacks are used (hope you have lots of RAM!), all the registers (and IO state, if shared between tasks -- things can quickly get complicated!) are saved for each task, and tasks are queued in a suitable fashion (round robin, cooperative, preemptive, etc.). And at that point, you should hope to have a MMU as well, and a kernel to orchestrate all of this; and pretty soon you'll be wishing for a chip that can boot Linux, and let that be that.
And, regarding multitasking: allowing a task to be interrupted is key. Simply enabling interrupts allows this, but it's crude and uncontrolled. You could assign a priority flag per task, which is checked if it's executing and how much priority it has; or you could poll for waiting events, or a termination flag, within the task's main loop. Loops are especially important, because you might think a given section should never have any way to lock up, but if you aren't proving that statement, for every possible input, and checking bounds and doing basic input verification, or even just fuzzing the function to see if a fraction of its possible inputs can cause bogus operation -- who knows? You're setting yourself up for bugs, there!
Keep this in mind: say you implement the function on a PC. Suppose the function accepts, say, two WORD parameters. You can
exhaustively search the entire parameter space in only 4 billion steps. That'll take maybe a few minutes or days on a desktop, for a lot of compact functions you'd use in an embedded project. And you can measure the execution time (well, grossly perhaps, if you don't have a high resolution counter like CPUID to use) and compare the function output to a table of precalculated values, or a mathematical expression, or something like that. Even if the table is several gigs long, that's not a huge burden, either! With in-depth testing being so cheap, writing truly iron-clad functions is easier than ever.
But yeah, tons of ways to go about things!
Tim