Ok, it's the quadrilateral sorting creating too much logic as when I remove it and just use the triangle sort, your FMAX's limiting factor is in the linegens instead. It needs to be sorted in 2 steps.
Also, we need to fix up the command pipe as all it's functions like bypassing the sort and the geo-shape and color are not being used by the next function.
AND, the command pipe pass through (lines 643-644) also needs to be after line 646.....
I recommend keeping the triangle sort as the first stage in the pipe, then for the second stage, if a quad it selected, only if y[2] >= y[3], swap xy[2] & xy[3], or, if y[0] >= y[3], then swap xy[0] and xy[3].
Remember, the middle 2 coordinates don't need their y sorting if they 3 have already been sorted on the previous clock.
I'm assuming you are sorting y[0] minumum, y[3] maximum.