Just going to interject my 2 cents after this discussion has been going..
If you have access to a JLink then also have a look at a tool from called SystemView. You can also use it to profile your application CPU time, but not only in a % count, but also on a time view graph with all the context switches and IRQs that are bouncing around in the system. It can be useful to detect issues like priority inversion and such. I think there is a FreeRTOS port for it.
It runs off Segger's RTT backend, which is a FIFO structure to provide tracing data to the host machine. The application puts in event messages whenever it is doing something, so this is a instrumentation profiler, and the host just empties the buffer as fast as it can. I think there is also a second channel to have printf() messages being redirected, so you can annotate your logs.
Unfortunately, because this is a commercial project I presume, it's not free though.. But if you need to dig a lot deeper into this it might be of use. Now a sampling profiler is usually also fine to get a rough idea what is costing CPU time. Even human-based-sampler by running the application and halting it a couple of times with a random distribution is usually good enough to get an idea.
If you want to have your application be as robust as possible against timing side effects then having everything event based is ideal. Usually all my RTOS tasks, if I use one, enter an infinite loop and then wait for signals/queue data before they start working on a particular command/action. Those signals are sent by other tasks and/or IRQs. This means that tasks are not time-based sleeping but instead waiting all the time. If they have work they go right into it. Its similar (but more overhead) to an event based framework, which doesnt need a RTOS kernel at all but then looses preemptive switching, of which I may want to transition to in the future (I've some ideas for this that are tailored towards low-power applications, which as you may know is my largest interest).
In the contrary, as Doctorandus says, if your application is polling instead with osDelay()s its more prone to falling over when the CPU load changes drastically. Now it doesnt have to be "dramatic" like the application hangs up, but its throughput bound may be less for example.
I think that a WFI(); would always be safe to add in a bare idle() task. The scheduler tick of a RTOS is usually ran on a timer (=IRQ), or with some parts on demand (usually also originating from an IRQ or a chain of task executions). The context switching code may also run in its own IRQ for some RTOS designs. E.g. if you have an IRQ that sends data to a task to process further, then the RTOS could check in its osSignalSet() function if that task was waiting for that signal and if it is OK to do a preemptive switch straight away (e.g. based on task priorities). If you write your application completely based on these kinds of signals or other IPC constructions (like queues etc.) then you could disable the scheduler tick, as it wont contribute anything to the scheduling behaviour as all is handled "on demand" in e.g. the osSignalSet (task activation) and osSignalWait (task deactivation) functions. You would probably loose the osDelay() functons though, as thats why the scheduler runs on a fixed time interval (1kHz=1ms ticks)