Yeah, they just need a little bit of gain at a low frequency, but not to DC. So it's AC coupled, with a bandwidth of, apparently, 0.5 to 5Hz or so. Which corresponds roughly to the expected frequency from one of those multi-faceted-lens people detectors: as long as a warm body is moving across the field of view, the image alternately comes into and out of view of each facet, producing a fairly specific alternating signal.
Probably, the diode clamps the first stage's output to about +/-1V around quiescent (very roughly, depending on just what the bias ends up at), so the second stage isn't driven beyond the rails (otherwise, the coupling cap could push the input above or below the supply rails, which an LM358 doesn't appreciate much). The second stage doesn't need this, because its output is centered around Vcc/2 and drives a window comparator (which has an input range which includes the 358's output range).
Tim