I wanted a good CPU to help automate tasks which are too complex for the more usual kind of Factorio combinator logic - but I couldn't find a nice design that I particularly liked, so I made my own CPU instead.
Screenshots:
Blueprints:
These had to be added as attached text files, since they are too large for the forum post size limit.EDIT: Replaced the blueprints with a slightly modified version that is now compatible with Space Age. The functionality of the CPU has not been affected in any way.
Overview:
Long story short, here's a quick outline of the design I've ended up with:- 57 powerful instructions; most of the arithmetic and logic ones execute in 6 ticks (10 IPS),
- x86-like addressing modes; most opcodes accept m32/r32 for the destination and imm32/m32/r32 as the source,
- 8 general purpose registers (including ESP),
- separate program (code) and data address spaces,
- packed program ROM: holds 10 instructions (lines of code) per each constant+decider combinator pair,
- memory-mapped I/O registers; easily extendable to any size,
- as-blueprinted it comes with 1200 lines of code ROM and 100 cells of data RAM, as well as 10 inputs and 10 outputs,
- microcode pipeline allows very complex and powerful instructions to be created (for example, POPAD reads 7 registers from the stack in a total of just 9 ticks),
- Factorio 1.x compatible: none of this fancy new Space Age stuff; it's some of the finest, time-proven, old-fashioned vacuum tube technology right here,
- now also compatible with Space Age.
Of course, there are also some important differences from the real x86.
After all, my design goal was not to create an exact reproduction of the x86's capabilities, but rather to create an efficient and effective CPU; completely sufficient for the kind of tasks that could be reasonably expected of it in Factorio.
To that end, I considered it counterproductive to try and exactly copy every single aspect of the x86, since attempting to do so would hugely increase the CPU's complexity for very little practical benefit.
In particular, the following x86 features didn't make the cut:
- interrupts of any sort (both hardware and software); I know that this will make at least some people unhappy, but I have no need for interrupts in my intended use cases, and adding support for them would entail significant extra complexity,
- everything involving the FPU (and floating point numbers in general) is not available,
- carry and overflow flags are not implemented (as most Factorio automation tasks don't really need to manipulate 32+bit numbers anyway),
- memory (RAM) addressing modes are significantly reduced compared to the real x86 (with realistically feasible amounts of RAM, you sure won't be working with complex data structures here anyway),
- some of the string handling instructions, such as INS/OUTS/SCAS, as well as REPZ/REPNZ prefixes (but not REP), are also gone,
- also not available are the CMOVcc instructions, as well as bit scan/test/set ones; all of those would be quite complex to implement and slow to execute within the limits imposed by Factorio's combinator logic (at least, as of 1.x).
But, enough about all that boring stuff - now let's take a look at what interesting things we can do with this.
Nuts & bolts:
First off, the most important part - instruction encoding and instruction set:HELL WORLD:
Ah yes, of course there is a HELL WORLD example program included with the CPU. Sure enough, I mostly play deathworld, why do you ask?But, more seriously now:
The example code is nothing special, it's just a rather simple implementation of a prime number generator which checks every consecutive odd positive integer if it is prime, and outputs it to the display if that is the case.
Because it is just an example program, I've deliberately avoided implementing some performance improvements in the code in order to showcase a wider variety of the available opcodes.
Code: Select all
Address	Opcode	Data	Mnemonic
entry_point:
1	80027	601	MOV	EDI,601
2	52	2	STOSD	2		;deal with the special case of 2 being a prime
3	30027	1	MOV	EBX,1
4	60027	600	MOV	EBP,600
5	21	7	JMP	.main_loop
.output_result:
6	210327	1	MOV	[EBP+1],EBX	;output result to display
.main_loop:
7	30002	2	ADD	EBX,2		;only check odd numbers - since by definition, evens >2 can't be prime
8	335	-	PUSH	EBX
9	4	22	CALL	isqrt
10	40227	-	MOV	ECX,EAX		;setup loop counter
11	40010	2	IDIV	ECX,2		;we only check odd divisors, so halve the loop counter
12	90027	1	MOV	ESI,1
.trial_loop:					;check every possible odd divisor from 3 to isqrt(x)
13	90002	2	ADD	ESI,2		;start with ESI=3
14	90307	-	CMP	ESI,EBX		;deal with the special case of input = 3, which would otherwise fail to be recognized as prime
15	20327	-	MOV	EAX,EBX
16	19	6	JZ	.output_result	;optimization: not having Jcc directly following CMP causes no extra delay to be inserted into the pipeline
17	20926	-	MOD	EAX,ESI		;does it divide exactly (without remainder)?
18	20255	-	TEST	EAX,EAX
19	25	13	LOOPNZ	.trial_loop	;loop while it doesn't
20	20	6	JNZ	.output_result	
21	21	7	JMP	.main_loop
;find integer square root using Heron's method, input is passed on the stack, output in EAX
isqrt:
22	60731	-	PMOV	EBP,ESP
23	22127	2	MOV	EAX,[EBP+2]	;s
24	40227	-	MOV 	ECX,EAX		;s
25	20043	1	SAR 	EAX,1		;x0=s/2
26	50427	-	MOV 	EDX,ECX 	;s
27	50210	-	IDIV	EDX,EAX		;s/x0
28	50202	-	ADD	EDX,EAX		;x0+(s/x0)
29	50043	1	SAR	EDX,1		;x1=(x0+(s/x0))/2
.loop:
30	50207	-	CMP	EDX,EAX		;x1,x0
31	16	38	JGE	.loop_end
32	20527	-	MOV	EAX,EDX 	;x0=x1
33	50427	-	MOV 	EDX,ECX 	;s
34	50210	-	IDIV	EDX,EAX		;s/x0
35	50202	-	ADD	EDX,EAX		;x0+(s/x0)
36	50043	1	SAR	EDX,1		;x1=(x0+(s/x0))/2
37	21	30	JMP	.loop
.loop_end:
38	60032	-	POP	EBP
39	38	1	RET	1		;EAX contains x0 (result)
Closing remarks:
Last but not least, some important notes about practical usage of this CPU:1. Normally, only one of the operands in an opcode can be a memory address; it is not possible to specify different memory addresses for the destination and the source for any instruction.
Note that technically it is valid to specify both the destination and source to point to the same memory address, however it would be rather unhelpful in most situations.
However, I can nonetheless see a few meaningful use cases, such as:
- using IMUL to square a variable stored in RAM,
- using XOR to clear a RAM variable in 1 instruction,
- using ABS on a memory location.
2. It is not possible to have both a memory operand destination and an immediate source value in the same instruction. Technically, this does work when used - however, the base memory address will be inevitably the same as the immediate value, which is generally quite unhelpful (even more so than in the preceding point).
3. Reading data values from program ROM is not possible, as there is no mechanism to allow program ROM contents to be accessed as data - and conversely, it is also not possible to execute data stored in RAM as code, either.
When using program ROM for the storage of constants (strings, lookup tables, etc) this has to be done by embedding the data values in valid instructions and executing them at an appropriate time.
If random access to the data is needed, storing the constants in RAM is one potential course of action. Also see example code below.
Another option would be to use the I/O logic with an array of inputs wired to constant combinators, to effectively implement a "data ROM" to use for lookup tables or other similar uses - without needing to use valuable RAM space for the storage of arrays of constants.
5. For the abovementioned purpose, r32 means any register other than EIP. As with the x86, there is no way to directly read the value of EIP.
6. I/O registers are mapped to the data RAM address space, and accessed in the exact same way as any other data RAM location. In the blueprint, it uses addresses 601...610 for the outputs and 801...810 for the inputs.
The "Raw fish" signals in the I/O block are obviously meant as placeholders; replace them with whatever signal you want to be using for any particular input or output, such as with the "Blue" signal used as an output to the display.
7. HCF instruction actually has a valid use, as a crude but functional form of a debugging breakpoint. Simply replace any instruction with HCF and the CPU will spinwait when it reaches that point, until it is replaced with another opcode.
8. I have not yet tested this in Space Age, only in 1.x version. However, I don't think there's anything in Space Age (based on the new features I know about) that should cause anything to break here.
9. While I have done a fair amount of testing on this CPU - including some rather obscure "corner cases" I could think of - and it appears to work fully as intended, nonetheless I cannot reasonably claim that it is totally entirely 100% bug-free; it is quite possible that there are still some weird corner case bugs lurking in there, waiting for the unwary victim.



