Renesas/Intersil TW9990 and TW9992 are composite video decoder ICs available in singles from Mouser for 6-7€.
TW9990 has 8-bit 4:2:2 YCbCr parallel digital output, and TW9992 MIPI output (CSI-2).
Something like ESP32-P4 should work with TW9992. With TW9990, you'd need 8-bit parallel data input plus sync signal inputs, and sufficient RAM for a full frame (something like 768 pixels per scan line, only part of which will contain data, and 625 scan lines for a full frame).
I would go for one of the really small Linux SBCs, perhaps an Orange Pi Zero 2W (~20€ from AliExpress), NanoPi NEO LTS or NanoPi NEO Air ($15 to $25 from FriendlyElec), instead, with one of the $10-$20 eBay EzCap composite video to USB 2 capture devices that advertise Linux compatibility. I avoid Raspberry Pis because of their hardware USB is faulty and I just don't like how the Foundation operates. They have additional GPIO pins available you can use for an external relay for handling the door button. For something like $50/50€ for a device with upstream Debian/Armbian support, this would make for a quite compact unit that would not consume too much power either.
The video interface will be USB Video, which means a large set of standard Linux UVC utilities will work for both video and for individual image capture. You can also run Nginx or Apache as your https server over WiFi on the SBC, allowing secure access to the SBC from e.g. your phone without relying on cloud services. With a bit of help from ffmpeg, you could even stream the audio and video to the web page (and audio back to the door phone), with a door opening button, assuming you do the security correctly (either client and server certificates, or IP-tied cookie-based user password authentication).