If you run the Mealy simulator you will see that it actually detects on clock cycle 3. It also fires on cycle 5 like the Moore.
Both of them are buggy. The most fundamental problem is that you do not have enough states. For the Moore state machine you need 5 states: representing that you have matches the last 0, 1, 2, 3, or 4 clock cycles (not counting the current cycle). Only in the final state is the output high. The mealy state machine likewise needs 4 states, representing if the previous 0, 1, 2, or 3 clock cycles matched, and then the output is a logical AND of being in the final state and the input matching the 4th bit.
Your two state machines are stylistically different which makes it harder than necessary to compare. For instance, one uses a reset, the other uses an initial block. One uses a single clocked block, the other uses a combinatoric block to compute the next state and a tiny clocked process to update the state variable. Especially if you want to contrast the different behavior between the two, I would try to make the code look as close to identical as possible, and have the test bench the same as well.
You use an always block for combinatoric logic with an explicit sensitivity list: "always@(state or inp)" Don't ever do this in synthesizable code. Use always @(*). If you can use System Verilog blocks you can also use always_comb.
I would strongly advice drawing a state diagram, including the initial/idle state on a whiteboard. Then carefully indicate how each state will be encoded in the register, and what the value of the output will be. Then write next to each bubble "output = XXX". For the moore version, that will be a constant. For the Mealy it can be an expression of the input.
Once you do that, it's very difficult to code it incorrectly.