Then I thought: Why not also (un)load the waiting train?

The basic idea is to just add a identical train stop just behind the existing one, but set a circuit connection to only activate it if the signal ahead of it is non-green. This can cause some false positives (a train could select the secondary station even if there is no train in the primary, e.g. because the train was just leaving), but in a running high-throughput experiment it seems to work pretty well. You could also condition it on a train actually being in the primary station, but that creates a lot of false negatives as it takes (relatively) long for the train to actually stop at the station, so in a high-throughput situation a train might already be waiting in the waiting bay by the time the station is activated (which is too late)
As you can see, 4 double stations can just about process two fully saturated input tracks, which would require 6-8 single stations; so I think a double stations has about 150% the throughput of a normal station plus waiting bay, in exactly the same area (but of course with higher material cost and potentially more bot distance and buffering)
Edit: close-up of station 'design' (the GIF is a bit too blurry to see details, not that there are many):

This shows the 'secondary' station linked to the signal. Both stations are now occupied, the top engine is the last (pushing) engine of the top train, the bottom engine is the first engine of the bottom train.