Factorio Mainframe Project - Up to 60 IPS

This board is to show, discuss and archive useful combinator- and logic-creations.
Smart triggering, counters and sensors, useful circuitry, switching as an art :), computers.
Please provide if possible always a blueprint of your creation.
pruby
Manual Inserter
Manual Inserter
Posts: 3
Joined: Tue Feb 28, 2017 9:40 am
Contact:

Factorio Mainframe Project - Up to 60 IPS

Post by pruby »

Hi all,

I'm pretty new to Factorio, but have designed CPUs for fun in a few other contexts, so decided to sling one together out of combinators. I know a bunch of people have done so already, but here's my take on it. This is still a development and concept project, not a final release for general use, and may contain bugs.

The whole spread out "development version" of the CPU looks like this:

Image

My requirements for this to be effective are:
  • Cheap and easy to build from blueprints.
  • Fast enough to be useful.
  • Simple enough to be easily programmed by hand.
  • Vanilla functional components (I did use Foreman for Blueprint export and Color Coding to mark areas).
Basically, it has to pull its weight - not be something that's more complicated and expensive to create than it's worth.

After thinking through this a lot, and a few iterations, the processor now has these features:
  • Harvard architecture - executes a program out of a separate instruction ROM.
  • 10-way "Barrel" Multithreading (I'll explain why in a bit).
  • 60 instructions per second (1 instruction per tick) peak throughput when all threads are active.
  • RISC-like instruction set - each instruction reads up to 3 memory locations (two direct, one indirect) and writes one result.
  • Data may contain all signals except the black colour signal. Most operations (e.g. addition, subtration) act on all signals in parallel.
  • This is a relatively small build in number of combinators. Once compacted, the final useable CPU with 120 registers is likely to take 4 substations' area.
  • The barrel architecture lends itself to an extremely dense register bank design, storing 6 registers per thread (60 total registers) in one substation's coverage area.
The most interesting feature is the 10-way hardware multi-processing. This avoids some issues with pipelined architectures (instruction latency), and allows us to run many programs on the one CPU without them interfering with each other. The cost is that the total instructions (60 IPS) are divided evenly between them - 6 instructions per second per thread. This means that a single thread is between 2 and 3 times slower than other published single-threaded CPUs, and we may have to make multiple threads work together to get similar performance.

The downsides of this design include:
  • The architecture must be fully pipelined to run at this speed. It therefore has to follow some rigid latency constraints. This makes adding custom operations somewhat harder.
  • Each single thread can run at only 1/10 of the total instruction throughput.
  • The memory requires three read lines and one write line, potentially all in use in a single tick, meaning we need a four-pole memory bus.
  • This also increases the size of basic memory registers from 4 combinators to 6 combinators (50% larger). Data ROM on the memory bus takes 3.5 combinators per address, more than double that of a single-read design like that used for instructions.
  • The CPU is relatively hard to debug, as it is free running, can't be stopped and is always in a changing state.
Data ROM is unfortunately huge due to needing 3 read lines per cell:

Image

This is a standard 12-register bank shared between all threads. The design is based on XKnight's memory cell, extended to triple read and independent write in a single tick:

Image

Image

However, if we have non-shared memory registers, because we only need to read/write 1/10th of the registers in any given cycle we can use 10-tick delay lines to store these values and share all of the read/write hardware. Each register appears completely separate to each thread, storing a total of 60 values in nearly the same area as 12 above:

Image

Image

The instructions available are:
  • 0 or missing: No-op
  • 1: Add (all signals R + S)
  • 2: Subtract (all signals R - S)
  • 3: Thread ID (Number from 1 to 10 identifying the current thread. This is used to find the right program to run.)
  • 8: Not equal (output 1 for each signal in R and S where R != S, pass zero S for an "is set" operation)
  • 9: Greater than (output 1 for each signal in R and S where R > S, pass zero S for an "is positive" operation)
  • 10: Less than (output 1 for each signal in R and S where R < S, pass zero S for an "is negative" operation)
  • 11: Filter signals (Output the value of each signal in R where those signals are non-zero in S). Credit to Megatron for this design viewtopic.php?f=5&t=30643#p193765
  • 16: Total across (sum all signals in R, output as signal A)
  • 17: Multiply out (multiply each signal in R by the A signal in S).
  • 18: Divide out (divide each signal in R by the A signal in S).
  • 24: Indirect load (load the memory at the address in the A signal of R)
  • 25: Indirect store (store S at the address in the A signal of R)
There is no jump instruction - the instruction pointer is mapped to address one and is writeable. Note there's a 1-cycle delay on this - the instruction after the branch will execute first.

These are all implemented in the core like this:

Image

Instructions are entered as a series of virtual signals in the instruction ROM:
  • O: operation code
  • R: first read address
  • S: second read address
  • W: write address
The current design is very spread out. Please note this contains coloured concrete from "Color Coding", used for marking. I'm working on a tool to help with reducing the area of these designs. The sample program included loads each thread's program address from the data ROM, and loops forever through instructions 1 to 3 if an address is missing. I believe thread 2 has a small test program in this blueprint.
blueprint
EDIT: This design may freeze on instruction 45 if you add more instruction ROM - I set a freeze point for debug. See the deciders to the right side of the instruction pointers - one says IF A < 45... :)
Post Reply

Return to “Combinator Creations”