Hi there!
Back in the days when dual-CPU (not core!) systems where the very top of the roof I did some video-editing...
I'm using linux so maybe you cannot do it the same way...
As you're doing I used a 2 step aproach ... rendering and after everything looked fine transcode it to VCD
At the very beginning the raw-data was mjpeg... When I set up my rendering software to produce exactly the same format for the rendered video, every part which was not changed by overlays, ... was just copied. So basically my old IDE RAID was the bottleneck... damn I really woud have liked a bunch of those fancy SCSI disks
Some years later I used to cut and transcode movies captured from my DVB-S (some german stations gave 30(!)mbit mpeg2) card... If nothing was changed during 2 key-frames, the part was just copied as it was with the old MJPEG vids. Transcoding was a over-night job to get a 2 hour movie transcoded to xvid...
So maybe you can do similarly and save at least some rendering-time but I'm not sure if you could get this working (bloody directshow... never liked this as a programmer...).
I've some experience in GPU programming and found that using them efficiently is not very easy. The major problem is that you load your input data to your RAM, then everything is copied to the RAM of the card. After everything is computed you need to copy it back to the system RAM and save it to you HDD. Including the syncing of your "threads" on the card and the CPU this results in a huge overhead...
Actually it seems that you need special GPU codecs like the NVENC from Nvida in order to get video transcoding of mpg4 faster on a GPU than on a CPU... but that was just quick google since I was curious how this progressed the last years....
I'm sure you already checked some generic benchmarks if your system is configured correctly (dual channel ram-access, pci-e speeds correctly set, ...) ... that could also push down your rendering speed (had that on my laptop...) but I think you have to use your GPUs special rendering cores instead of some generic gpu-implementation... at least on paper the performance of your geforce sounds massive!
Cheers from Austria,
Hans