Well of course you have to make sure your timings and latencies work out for the bus you are running across. Buses are usually configured for certain timing settings, so its just a matter of setting up the bus correctly.
LVDS obviously adds extra latency to the whole thing because the process of serialization takes time, but the time it takes is always constant so the bus can be configured for the correct latency. Clock + unidirectional LVDS pairs is the way to go, tho you may want more than one pair going each way if you want to have lots of speed and low latency.
As for narrowing the bus down to 16bit or 8bit, this is done by the same memory bridge block that you will likely need to sort out the bus timings (Internal buses likely run on timings that are too fast). Various memory bus bridges and adapters are often used inside FPGAs for connecting modules with the wrong bus type onto the given main memory bus. So if you connect a module with a 8bit interface to a 32bit bus there will have to be a bus converter between to make it work. Vendors often provide such bus converters, or provide a tool that builds your bus system automatically, inserting these converters wherever needed like magic (Like Altera SOPC builder and whatever Xilinxes equivalent tool is called).
Even so these bus width/timing adapters are not hard to make yourself as long as you don't try to support all the advanced functionality of a complex memory bus configuration (Like transaction interleaving, reordering, buffering, special DMA related stuff etc) that in most cases you will not use or need. So in the end all you need to handle is "Please write X to location Y" and "Please tell me what is at location Y" along with perhaps a busy signal to pause the bus if something clogs up. Its your choice how to get that information across depending on your speed and performance needs. Heck you could do it all over a simple SPI bus if high bandwidth is not required.