Just use series resistor termination, I would use 4x resistor packs, something like 100ohm for 20mhz data, 50ohm for the 40mhz clock. If you are using a FPGA, use IOs from one bank and make sure you can set output current drive strength, which basically give you an equivalent series termination scrapping those series resistors. Also, for FPGAs, use a clock output where if necessary in post, you can tune either the clock output phase, or relative data output phase.