interesting to look at ! despite having two massive and chaotic congestion they both stabilize in around the same time
this wiki page may help you understand a bit more about the trains that ends up out of place at then end :
https://wiki.factorio.com/Railway/Train_path_finding
my guess would be that this depot is "too big" defined by the fact that some trains seemingly prefer an occupied-by-train-path, rather than a looooong way to find a free depot slot.
I would expect the path finding penalty for a train to wait (mistakenly) behind another train or in main lane to not be enough in this case to make them look for another path, or the other path being "too long" so that the train prefer to somewhat take its chance at waiting behind another train rather than going all the way.
To test this hypothesis you could manually force all train to repath when it has stabilize by adding another station with the same name as the others somewhere on the map, or have 1 that open/closes , this would force "stuck" train to repath, and if they do not repath toward a free slot but instead keep waiting towards the "perceived" closer slot but in reality occupied slot, then i would see it as a confirmation.
another hint would be the time limit, it's been quite a while since i last used LTN and it may have changed, but i don't think it is supposed to occur in a regular mode of operation, instead i thought it should act when extra-ordinary event occurs, such as biters eating rails because if it 'can' happens otherwise, it 'will' happens

, there are enough trains for it to occur often.
This particular system seems to have proportion problems, where what takes the most time for a train is waiting for the train before it, while it could be rolling far away to get material. It makes it mesmerizing to watch, but will show limits if the aim is to carry around ressources in a reliable timeframe.
Also following the same hypothesis those train getting stuck might not occur if you keep the system under a certain threshold of usage at any given time ,since there would be more possibilties for trains to find a closer path that would actually trigger the repathing instead of having the only opens bay left too far from the the wrong parking position to be detected.
The main bottleneck i see i would say is the entrance lane of the depot, it would be like comparing a road in the country side and a lane of a highway, there would be a maximum throughput if everyone was to closely follow the breaking distance and maximal speed. It could even be the same amount of car per hour, but the trafic would look different, one would be slower going, packed car, the other one car going very fast with big gap between them. Some useful data are
[1] the minimum time it takes for 2 car to clear a block in a row, ( depends on
[1a]speed ,
[1b]size of block and
[1c] breaking distance, it is hard-capped in game ),
[2] the total number of car you expect to clear this block in a row (12000 car , or 160 trains ), and
[3] the total timeframe available ( The whole day, two hours, 2 minutes ).
For cars like for trains it is very-suboptimal when the flow is poorly distributed over the time period , traffic jams at peak and empty roads otherwise, so the gifs are showing extreme cases but even if it was 2 minutes of average traffic flow, there would still be congestion i think; in this particular situation, out of [1a] [1b] [1c] [2] and [3] the only parameter that is problematic to me is [2] given [1x] and [3] set up already.
You could analyse the exceeding of the time limit as the consequence of the system not being able to meet the objective [2] given [3] while [1x] is capped by fuel, train composition, breaking distance research, and number of rails between signals . The congestion being an aggravating factor due to impacting the [1x] ,very negatively for [1a] and marginally positively for [1b] and [1c] . It's an overall shift away from the optimal flow, that intuitively i think can't be found within reasonnable [1x] to meet [2] within [3]
a visual metaphora would be, there's too much water in the tank for too small pipe anyway, the resolving is it takes more time for all the water to drop, the wondering about the air bubble and their impact on the flow is secondary because you would require physically impossible water speed to empty the tank within a day.
In my view this advocate for 4x40 each of them called [D1]to [D4] and located at each corner instead of in the middle of the structure because it makes designing a single depo way easier

and naturally distribute traffic without worrying about counter-intuive results from the pathfinding rules and instead having a simpler relation between distance and time limit, having smoother average speed considering traffic is always good.