I am not sure I know what you mean by elegant. There are some pretty fundamental trades in object detection, and no real way around them that I know. Edge detection, for example, requires an absolute minimum of comparing two pixels (and performs poorly at that minimum level). So the only way to further reduce processing is to reduce pixel count. If you know a priori the size of the objects you wish to detect you can match the resolution to that size and reduce pixel count. Or you can reduce processing by only looking for changes around previously detected objects. That leads to a whole family of solutions that does intense processing occasionally with delta processing on the frames between. As you pointed out, reducing frame rate is an option.
The on chip processing approaches mentioned above don't fundamentally change any of this, they merely apply custom hardware to part of the solution and reduce the size of the data channel heading downstream.
None of these solutions works in a general case, but can work quite well in a well defined problem space. As an example the processing required to detect a red ball in a green field can be quite simple. Detecting a single, moving, known size and contrasting object in a grey scale image is also relatively easy and computationally achievable. Can you narrow down your objective to allow some form of simplification?