Run 'n' buffered shift registers in parallel (not serial), and pick one bit off from each SR.
Run two shift registers, one controls the data and another controls analog switches.
That is how i do that in an fpga. We have a domain specific serial protocol. Looks like spi but it isnt as the addressfield length can change on the fly.
High priority address is 1 bit. Second priority is 2 bits.
So 1data32, 001data32,010data32,011data32,000<5bit addr><16 bit data>
I blast that using three shifters. One has the bitpatter, the second has the tri pin of the buffer after the first shifter, the third controls the select line.