BenSeidel wrote:I have spent the last 6 years researching and writing a dynamic application whose structure is not known at compile time.
6 years of researching and yet...
But anyway, Clocks (and clock boundaries) are an extremely powerful technique that I have never seen applied in software.
You might want to try to learn how to use google.
Java has a class dedicated to "clocks" (cyclical barriers):
https://docs.oracle.com/javase/7/docs/a ... rrier.html
Make that two classes:
https://docs.oracle.com/javase/7/docs/a ... haser.html
(paper with benchmarks here:
http://www.cs.rice.edu/%7Evs3/PDF/SPSS08-phasers.pdf)
And later in this thread you provide a Java program demonstrating "your" technique... but don't even bother to use either of the above two built-in Java classes designed specifically for this purpose? Oversights like that suggest that you really don't have any idea what many other programmers are or are not using in their software.
Not until mainstream games require at least 16+ cores will you see Intel making them standard.
I'm not sure I can wrap my head around your level of delusion. Please reread what you wrote and see if you now understand why it makes absolutely zero sense.
No software that requires N cores can possibly be "mainstream" until
after N cores are "standard" (i.e. most mainstream gamers have >= N cores in their box). You're essentially asking for causality to be violated. If you want to push the envelope by
requiring a certain number of cores, it has to be with niche software, not mainstream software.
Luckily, requiring N cores is not, um, required. You can push the envelope simply by having games (and other popular applications) that don't
require N cores to run, but offer significant advantages to those players who do have N cores. For example, a role playing game might run on standard hardware and support X NPCs over Z square miles of land. But if the player has four times as much computing power and memory, maybe that same game can support 4*X NPCs over 4*Z square miles of land (supporting more kingdoms, races, quests,etc.), giving the player a much bigger world to play in. (In theory a "properly parallelized" Factorio could do the same, but frankly I don't find making Factorio mega-factories for the sake of making mega-factories to be compelling. IMO the Factorio devs should stop wasting any of their limited time making stuff go faster and instead fix their completely boring/broken gameplay, but I have no false hope of that actually happening. I.e., vanilla Factorio is a decent
sandbox, and even a pretty good sandbox with mods, but it's a really poor
game.)
They follow where software developers lead.
More nonsense, and especially obnoxious nonsense in a thread about multi-core systems. Intel didn't start making multi-core systems because "software developers lead", they did it
despite the fact that software developers were (by and large) not even ready for it. And in the consumer space they've gone from (mostly) 2 core systems to mostly 4 core systems without software really "leading the way" or even catching up. And when they start pumping out 8 core consumer systems en masse, software will still likely be lagging. (And how does it make sense for you to contradict the whole point of your own thread? Oh, that's right, because your entire argument is self-contradictory.)
Software developers, hardware developers, users - it's all an ecosystem. Anyone in that ecosystem can
try to lead (even users by expressing demand loudly enough), but that means investing in the effort, and people don't invest in things unless they believe that investment will pay off. In terms of how many cores CPUs have, Intel is a bit of a "reluctant leader" - they didn't really want to lead this way, but they needed to make performance numbers keep growing and single-core was out of gas.
If you think that it can't be done or that software is just "too complex" then I implore you to learn VHDL and see what chip designers have been doing since the the 1980's.
With today's tools, writing
good (efficient) parallel code in anything other than trivial (little/no fine-grain interaction) cases does tend to be "too complex". And I rebuff your imploring by noting that I've known VHDL for many years (and used it both on and off the job). Note that the fact that you can make something run in parallel does not mean you have created good (highly optimized) code. And lack of optimization (e.g. just copying all of memory, only handling the minimum "unit of work" size rather than special-casing center vs edge and tiling for best cache use) is contributing significantly to the simplicity of your specific take on this general principle (barriers/phases + per-phase isolation).
Optimized code of this nature really requires language/tool support to be feasible at all for most "mainstream" programmers, or to be considered practical for complex projects even for those of us who don't
technically need such support (which is one of the reasons I happen to be working on my own language+compiler that supports semi-automatic parallelization of code of this sort). "Unoptimized" (simple) code might be good enough in some cases (maybe even optimal), but unacceptable or not worth it in others (potentially even worse than not parallelizing at all, especially if some users are on low-end hardware).
They are micro-parallelisms, allowing sections of your code to run in parallel where possible, because simply put, software developers are too lazy to do it themselves. Who remembers the days of having to put no-ops after a branch because the CPUs didn't do it.
Programmers have had no experience programming things in parallel and hence fear it.
Right. It couldn't possibly be that we make rational decisions based on actual knowledge (including first-hand experience), based on the difficulty and other properties of the specific problem domain we're working on, based on the opportunity-cost of not doing other things, and based on what customers are willing to accept (pay money for). Programmers are all just irrational lazy cowards. Very convincing argument.
And, of course, nevermind that modern CPUs provide instruction-level parallelism
not because programmers are too lazy but because there is
no other way in existing CPUs for programmers to efficiently express/implement such micro-parallelism. And nevermind that the need for no-ops after branches was dropped not because programmers were lazy but because there was often nothing useful you could put there (thus resulting in poorer instruction-cache, fetch, and decode performance than an auto-inserted stall) and because of its very limited usefulness (fixed single slot) which meant high performance CPUs had to do dynamic instruction scheduling
anyways as branch mispredict costs greatly exceeded the fraction of a cycle a single branch slot could fill. You're just rewriting history in a way that doesn't reflect reality.
To me it's all wasted real estate.
Tragedy of commons. (Are you offering to rework all of the code that benefits from that hardware so it's no longer needed? Didn't think so.)
Anyways, if you want to see a much better direction that processors might go, check out some of the
Mill CPU talks. That thing is
insanely better than x86/ARM/MIPS/etc. Much better performance,
much better ILP, with less "overhead" hardware (it gets better ILP than an x86 using dynamic instruction scheduling, and does so using static instruction scheduling, so it needs no dynamic instruction scheduling hardware - how they do that is explained in the videos), and they claim to need far less energy/operation. (All claims except the last are "obviously true" just from the information in their jaw-dropping videos. The energy efficiency claim will be proven one way or the other once they're running on real silicon. These dudes have some serious brains and are making Intel/AMD/ARM all look kind of stupid by comparison.)
Do you know what the best part is? With their top end being able to dispatch 32 "instructions" (which they call "operations", but they are similar in work-amount to instructions on "normal" CPUs) every single cycle
per core, they will make sticking with single-threaded applications viable even longer into the future.
