I'm pretty new to Factorio, but have designed CPUs for fun in a few other contexts, so decided to sling one together out of combinators. I know a bunch of people have done so already, but here's my take on it. This is still a development and concept project, not a final release for general use, and may contain bugs.
The whole spread out "development version" of the CPU looks like this:
My requirements for this to be effective are:
- Cheap and easy to build from blueprints.
- Fast enough to be useful.
- Simple enough to be easily programmed by hand.
- Vanilla functional components (I did use Foreman for Blueprint export and Color Coding to mark areas).
After thinking through this a lot, and a few iterations, the processor now has these features:
- Harvard architecture - executes a program out of a separate instruction ROM.
- 10-way "Barrel" Multithreading (I'll explain why in a bit).
- 60 instructions per second (1 instruction per tick) peak throughput when all threads are active.
- RISC-like instruction set - each instruction reads up to 3 memory locations (two direct, one indirect) and writes one result.
- Data may contain all signals except the black colour signal. Most operations (e.g. addition, subtration) act on all signals in parallel.
- This is a relatively small build in number of combinators. Once compacted, the final useable CPU with 120 registers is likely to take 4 substations' area.
- The barrel architecture lends itself to an extremely dense register bank design, storing 6 registers per thread (60 total registers) in one substation's coverage area.
The downsides of this design include:
- The architecture must be fully pipelined to run at this speed. It therefore has to follow some rigid latency constraints. This makes adding custom operations somewhat harder.
- Each single thread can run at only 1/10 of the total instruction throughput.
- The memory requires three read lines and one write line, potentially all in use in a single tick, meaning we need a four-pole memory bus.
- This also increases the size of basic memory registers from 4 combinators to 6 combinators (50% larger). Data ROM on the memory bus takes 3.5 combinators per address, more than double that of a single-read design like that used for instructions.
- The CPU is relatively hard to debug, as it is free running, can't be stopped and is always in a changing state.
This is a standard 12-register bank shared between all threads. The design is based on XKnight's memory cell, extended to triple read and independent write in a single tick:
However, if we have non-shared memory registers, because we only need to read/write 1/10th of the registers in any given cycle we can use 10-tick delay lines to store these values and share all of the read/write hardware. Each register appears completely separate to each thread, storing a total of 60 values in nearly the same area as 12 above:
The instructions available are:
- 0 or missing: No-op
- 1: Add (all signals R + S)
- 2: Subtract (all signals R - S)
- 3: Thread ID (Number from 1 to 10 identifying the current thread. This is used to find the right program to run.)
- 8: Not equal (output 1 for each signal in R and S where R != S, pass zero S for an "is set" operation)
- 9: Greater than (output 1 for each signal in R and S where R > S, pass zero S for an "is positive" operation)
- 10: Less than (output 1 for each signal in R and S where R < S, pass zero S for an "is negative" operation)
- 11: Filter signals (Output the value of each signal in R where those signals are non-zero in S). Credit to Megatron for this design viewtopic.php?f=5&t=30643#p193765
- 16: Total across (sum all signals in R, output as signal A)
- 17: Multiply out (multiply each signal in R by the A signal in S).
- 18: Divide out (divide each signal in R by the A signal in S).
- 24: Indirect load (load the memory at the address in the A signal of R)
- 25: Indirect store (store S at the address in the A signal of R)
These are all implemented in the core like this:
Instructions are entered as a series of virtual signals in the instruction ROM:
- O: operation code
- R: first read address
- S: second read address
- W: write address
EDIT: This design may freeze on instruction 45 if you add more instruction ROM - I set a freeze point for debug. See the deciders to the right side of the instruction pointers - one says IF A < 45...