I didn't know 2D PSD. Is Hamamatsu the only one that has this type of sensor?
From what I know of Hamamatsu, they are very good...but quality comes with a price.
The use of the lens + camera is solidifying as an excellent medium-cost idea.
Using artificial light and controlling what the camera sees (a cupola in 3D printing for example), you won't even need to do much processing. Just run the pixels and find the coordinate of the DOT.
Actually I hadn't thought of using a cheap PCB, but it will be easy to find a small silver circle on the green soldermask.
PCB: Soldermask is inaccurate, make your fiducials (dots) on a bare PCB surface. Don't use HASL, as it is imprecise as well, either ENIG or chemical tin.
Illumination wise the dots should be somehow matte, for a mirror-like reflective target you need coaxial illumination (beamsplitter), otherwise side illumination is simply reflected away.
I doubt you get away with a single dot. With 5 mm FOV and micrometer (precision, resolution?) you'd need a high resolution sensor and a macro lens that is razor sharp to the edges. Better have smaller FOV for which you need several markers. Using a non-regular (pseudorandom or maybe just varying distance) you can find out where the dot in the FOV belongs.
Anyway I think you need to somehow substantiate your accuracy requirement. A stable 1 um absolute precision is metrology grade and accordingly difficult. At that level, every component you add is a thermometer. So unless you are in a temperature controlled lab, you'll have to be very careful about your construction. A camera would be more difficult because of the overall size of the lens/sensor assembly.
Maybe a stable readout of like 5 um (RMS noise 1 um) with repeatability of 10 um and linearity 100 um sounds more realistic. You can linearize the measurements if the are stable enough.
It seems you don't intend to follow up on the PSD, so I need to do more advertising. They are kind of exotic and maybe a bit old school, but have very interesting properties useful for specialized uses.
I have only been aware of Hamamatsu, but a quick search yielded other manufacturers as well. They are not cheap (it seems $20-$50 is realistic), ask Hamamatsu for a quote. RS has them in stock (not sure about MOQ), also I found some on Aliexpress (fake?). However they are far less expensive than any decent machine vision lens.
We used them for measuring the settling time (XY) of a high speed precision positioner, to sub um resolution (I darkly remember, although I am not sure). Nice property here is you get easily kHz output bandwith, try this with a camera (1000 fps high speed camera!). Your use is the opposite of course.
They are ratiometric devices, a property very helpful for precision measurement. You just get a handful of voltages you need to digitize (12 to 16 bit, depending), giving you the "center of gravity" of the beam. No need for fancy illumination, we used a tiny THT LED with a lens, in your case I'd use a metal mount with a tiny hole drilled in front as aperture (I once asked the workshop to drill a 50 um hole for a similar application, which they did on their CNC machine without breaking off a single drill bit).
For better than 100 um accuracy I guess you will need to linearize the readout.
You'll need a solid positioning, most likely machined metal (watch TC), certainly nothing hot-glued or 3D printed as this woud drift far more than your measurement accuracy within a month.
Camera solution is more difficult, as depending on focal length the overall camera tube length might be > 100 mm, which means a tiny tilt of the camera assembly (temperature!) might already give an offset.
PSDs are much easier here, as you can have LED (possibly within mount with aperture) very close to the surface of the chip (depending on expected Z movement), so tilt has a far lesser impact.