I wanted a good CPU to help automate tasks which are too complex for the more usual kind of Factorio combinator logic - but I couldn't find a nice design that I particularly liked, so I made my own CPU instead.
Screenshots:
Blueprints:
These had to be added as attached text files, since they are too large for the forum post size limit.EDIT: Replaced the blueprints with a slightly modified version that is now compatible with Space Age. The functionality of the CPU has not been affected in any way.
Overview:
Long story short, here's a quick outline of the design I've ended up with:- 57 powerful instructions; most of the arithmetic and logic ones execute in 6 ticks (10 IPS),
- x86-like addressing modes; most opcodes accept m32/r32 for the destination and imm32/m32/r32 as the source,
- 8 general purpose registers (including ESP),
- separate program (code) and data address spaces,
- packed program ROM: holds 10 instructions (lines of code) per each constant+decider combinator pair,
- memory-mapped I/O registers; easily extendable to any size,
- as-blueprinted it comes with 1200 lines of code ROM and 100 cells of data RAM, as well as 10 inputs and 10 outputs,
- microcode pipeline allows very complex and powerful instructions to be created (for example, POPAD reads 7 registers from the stack in a total of just 9 ticks),
- Factorio 1.x compatible: none of this fancy new Space Age stuff; it's some of the finest, time-proven, old-fashioned vacuum tube technology right here,
- now also compatible with Space Age.
Of course, there are also some important differences from the real x86.
After all, my design goal was not to create an exact reproduction of the x86's capabilities, but rather to create an efficient and effective CPU; completely sufficient for the kind of tasks that could be reasonably expected of it in Factorio.
To that end, I considered it counterproductive to try and exactly copy every single aspect of the x86, since attempting to do so would hugely increase the CPU's complexity for very little practical benefit.
In particular, the following x86 features didn't make the cut:
- interrupts of any sort (both hardware and software); I know that this will make at least some people unhappy, but I have no need for interrupts in my intended use cases, and adding support for them would entail significant extra complexity,
- everything involving the FPU (and floating point numbers in general) is not available,
- carry and overflow flags are not implemented (as most Factorio automation tasks don't really need to manipulate 32+bit numbers anyway),
- memory (RAM) addressing modes are significantly reduced compared to the real x86 (with realistically feasible amounts of RAM, you sure won't be working with complex data structures here anyway),
- some of the string handling instructions, such as INS/OUTS/SCAS, as well as REPZ/REPNZ prefixes (but not REP), are also gone,
- also not available are the CMOVcc instructions, as well as bit scan/test/set ones; all of those would be quite complex to implement and slow to execute within the limits imposed by Factorio's combinator logic (at least, as of 1.x).
But, enough about all that boring stuff - now let's take a look at what interesting things we can do with this.
Nuts & bolts:
First off, the most important part - instruction encoding and instruction set:HELL WORLD:
Ah yes, of course there is a HELL WORLD example program included with the CPU. Sure enough, I mostly play deathworld, why do you ask?But, more seriously now:
The example code is nothing special, it's just a rather simple implementation of a prime number generator which checks every consecutive odd positive integer if it is prime, and outputs it to the display if that is the case.
Because it is just an example program, I've deliberately avoided implementing some performance improvements in the code in order to showcase a wider variety of the available opcodes.
Code: Select all
Address Opcode Data Mnemonic
entry_point:
1 80027 601 MOV EDI,601
2 52 2 STOSD 2 ;deal with the special case of 2 being a prime
3 30027 1 MOV EBX,1
4 60027 600 MOV EBP,600
5 21 7 JMP .main_loop
.output_result:
6 210327 1 MOV [EBP+1],EBX ;output result to display
.main_loop:
7 30002 2 ADD EBX,2 ;only check odd numbers - since by definition, evens >2 can't be prime
8 335 - PUSH EBX
9 4 22 CALL isqrt
10 40227 - MOV ECX,EAX ;setup loop counter
11 40010 2 IDIV ECX,2 ;we only check odd divisors, so halve the loop counter
12 90027 1 MOV ESI,1
.trial_loop: ;check every possible odd divisor from 3 to isqrt(x)
13 90002 2 ADD ESI,2 ;start with ESI=3
14 90307 - CMP ESI,EBX ;deal with the special case of input = 3, which would otherwise fail to be recognized as prime
15 20327 - MOV EAX,EBX
16 19 6 JZ .output_result ;optimization: not having Jcc directly following CMP causes no extra delay to be inserted into the pipeline
17 20926 - MOD EAX,ESI ;does it divide exactly (without remainder)?
18 20255 - TEST EAX,EAX
19 25 13 LOOPNZ .trial_loop ;loop while it doesn't
20 20 6 JNZ .output_result
21 21 7 JMP .main_loop
;find integer square root using Heron's method, input is passed on the stack, output in EAX
isqrt:
22 60731 - PMOV EBP,ESP
23 22127 2 MOV EAX,[EBP+2] ;s
24 40227 - MOV ECX,EAX ;s
25 20043 1 SAR EAX,1 ;x0=s/2
26 50427 - MOV EDX,ECX ;s
27 50210 - IDIV EDX,EAX ;s/x0
28 50202 - ADD EDX,EAX ;x0+(s/x0)
29 50043 1 SAR EDX,1 ;x1=(x0+(s/x0))/2
.loop:
30 50207 - CMP EDX,EAX ;x1,x0
31 16 38 JGE .loop_end
32 20527 - MOV EAX,EDX ;x0=x1
33 50427 - MOV EDX,ECX ;s
34 50210 - IDIV EDX,EAX ;s/x0
35 50202 - ADD EDX,EAX ;x0+(s/x0)
36 50043 1 SAR EDX,1 ;x1=(x0+(s/x0))/2
37 21 30 JMP .loop
.loop_end:
38 60032 - POP EBP
39 38 1 RET 1 ;EAX contains x0 (result)
Closing remarks:
Last but not least, some important notes about practical usage of this CPU:1. Normally, only one of the operands in an opcode can be a memory address; it is not possible to specify different memory addresses for the destination and the source for any instruction.
Note that technically it is valid to specify both the destination and source to point to the same memory address, however it would be rather unhelpful in most situations.
However, I can nonetheless see a few meaningful use cases, such as:
- using IMUL to square a variable stored in RAM,
- using XOR to clear a RAM variable in 1 instruction,
- using ABS on a memory location.
2. It is not possible to have both a memory operand destination and an immediate source value in the same instruction. Technically, this does work when used - however, the base memory address will be inevitably the same as the immediate value, which is generally quite unhelpful (even more so than in the preceding point).
3. Reading data values from program ROM is not possible, as there is no mechanism to allow program ROM contents to be accessed as data - and conversely, it is also not possible to execute data stored in RAM as code, either.
When using program ROM for the storage of constants (strings, lookup tables, etc) this has to be done by embedding the data values in valid instructions and executing them at an appropriate time.
If random access to the data is needed, storing the constants in RAM is one potential course of action. Also see example code below.
Another option would be to use the I/O logic with an array of inputs wired to constant combinators, to effectively implement a "data ROM" to use for lookup tables or other similar uses - without needing to use valuable RAM space for the storage of arrays of constants.
5. For the abovementioned purpose, r32 means any register other than EIP. As with the x86, there is no way to directly read the value of EIP.
6. I/O registers are mapped to the data RAM address space, and accessed in the exact same way as any other data RAM location. In the blueprint, it uses addresses 601...610 for the outputs and 801...810 for the inputs.
The "Raw fish" signals in the I/O block are obviously meant as placeholders; replace them with whatever signal you want to be using for any particular input or output, such as with the "Blue" signal used as an output to the display.
7. HCF instruction actually has a valid use, as a crude but functional form of a debugging breakpoint. Simply replace any instruction with HCF and the CPU will spinwait when it reaches that point, until it is replaced with another opcode.
8. I have not yet tested this in Space Age, only in 1.x version. However, I don't think there's anything in Space Age (based on the new features I know about) that should cause anything to break here.
9. While I have done a fair amount of testing on this CPU - including some rather obscure "corner cases" I could think of - and it appears to work fully as intended, nonetheless I cannot reasonably claim that it is totally entirely 100% bug-free; it is quite possible that there are still some weird corner case bugs lurking in there, waiting for the unwary victim.