The big question is, where and how is the data finally used?
Most of the time, doing all processing "at once" is most efficient. I.e., where you are looping through all items anyway, do everything there. When you have already loaded from memory to CPU registers, it usually doesn't matter if your processing is then 10 or 11 or 15 instructions. Load and stores are slow in comparison; if you have slower memory interfaces and need caches, the difference is even bigger.
"On-the-fly" processing makes sense when you are able to significantly "pack" the data, i.e., decrease the memory footprint. Unpacking on the fly sounds like the opposite, don't do it.
To me, it sounds a good idea to keep the efficiently packed 12-bit data as long as possible and "unpack" at use.