You can also change the bit clock sensitivity to:
shift_bit_in <= reg_comms_clk && !dly_comms_clk;
If you want to trigger exclusively on the rising edge.
It depends on your bitbang routing.
I know I would do bare metal in assembly to send 2 bytes by writing 35 lines of code like:
clrf port (mircochip style)
bittestsc byte1,bit0
set port,databit
set port,clk
clr port,databit
bittestsc byte1,bit1
set port,databit
clr port,clk
.....
At 20 mips on the MCU, this would transmit at 4 megabaud.