When you are on the master side, bit-banging is easy and doesn't need much. You just unroll the loop for the entire transmission and then you only need 2-4 instructions per bit. Which means you can easily do about 2 MHz with PIC16. If you want faster, you need something which can support a faster clock, but you don't want anything with cache or other timing impediments.
SPI modules will often work on 8/16/32 bit chunks, but you can always add few bit-banged bits at the beginning, or at the end. Usually, SPI will be 2-3 times faster than bit-banging on the same CPU. Probably easier too.
DMA will work just as well, however small MCUs usually don't have DMA so you'll have to go bigger. DMA is the easiest way if you want to communicate with both the encoder and the inverter at the same time.
Working as an SPI slave is harder for a CPU because it must adjust of the pace of the external clock. Thus bit-banging will be harder and slower. An SPI module with a buffer will work much better. An SPI module coupled with DMA will be the best.