No belt-weaving, input from and output to below. You have 1 row to spare without losing the beacon grid, which I use to get the second assembler to cross side so output is on the right side of the belt. Unfortunately it has iron shortage, but I'm at a loss how to give it a second iron inserter. Anyway, it achieves 2.3-2.4k/s, so quite close to the maximum throughput of a blue belt (2.4k/s).
Blueprint