Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal set of low level words to build forth #92

Open
kt97679 opened this issue Dec 27, 2020 · 48 comments
Open

Minimal set of low level words to build forth #92

kt97679 opened this issue Dec 27, 2020 · 48 comments
Labels
implementation How to implement a Forth system question

Comments

@kt97679
Copy link

kt97679 commented Dec 27, 2020

Hi folks,

I was curious how many low level words do you need to build forth. So I took sod32 by Lennart Benschop which uses 32 low level words and started to redefine low level words via other low level words. The whole process can be seen here. At the end of the day I managed to reduce low level words to 7 primitives: nop, nand, !, @, um+, special, lit. Inner interpreter has additional logic for 'exit' and 'call'. I was running tester.fr to ensure all words behave as expected and also used run time to track performance. My final implementation runs 708 times slower than original sod32. This whole effort hardly can be used in practice, but it was a fun little project. I wonder if we really need 3 primitives to access memory: @, ! and lit? Please let me know if there is any trick I can use to reduce this.

Thanks,
Kirill.

@pebhidecs
Copy link

pebhidecs commented Dec 27, 2020 via email

@kt97679
Copy link
Author

kt97679 commented Dec 27, 2020

Hi Paul,

do you know by chance how we can implement arithmetic and boolean operations just using XC@, XC!, and
@execute? With all honesty I have no idea how this can be done.

Thanks,
Kirill.

@pebhidecs
Copy link

pebhidecs commented Dec 27, 2020 via email

@mitra42
Copy link

mitra42 commented Dec 27, 2020

I think I'd want to see the rest of for example Ting's set implemented in these 7 words ?

I started with Ting's set when building webForth, though there were a few more words that it made a HUGE difference to code - especially find or some part of it.

@mitra42
Copy link

mitra42 commented Dec 27, 2020

@paul if XC! , XC@ etc are used to write machine code, then I don't see this as a minimum set because all its done is move "code" into forth.

@kiril - what is "special" in your list.

@kt97679
Copy link
Author

kt97679 commented Dec 27, 2020

@mitra42 special is used to call os functions like read and write files. It can be implemented via syscalls.

@mitra42
Copy link

mitra42 commented Dec 27, 2020

A lot of the exercises in this tend to just implement a virtual machine (for example with a huge switch statement), and then define a few primitives on top of that, and then define forth in those primitives - for example I could define forth in one word opcode then define "rot" as 5 opcode. But if I do that, then I haven't really made the port any easier since I've still got to define about 30 words in my switch statement. It doesn't look like you've done this, but I still can't see for example how to build rot from your 7 words: nop, nand, !, @, um+, special, lit. can you define rot in those words so e can see your approach.

@kt97679
Copy link
Author

kt97679 commented Dec 27, 2020 via email

@mitra42
Copy link

mitra42 commented Dec 28, 2020

So if I read that commit correctly, you are using some working variables (at addresses 8, 12, 16) to allow things like ROT and SWAP to be defined.

(By the way, its coincidence that my example opcode two comments above uses the same word as SOD32 does)

@kt97679
Copy link
Author

kt97679 commented Dec 28, 2020 via email

@Bushmills
Copy link

Bushmills commented Jan 13, 2021

"I still can't see for example how to build rot from your 7 words: nop, nand, !, @, um+, special, lit. can you define rot in those words."

One can build flip flops and address decoders from NANDs, SRAM and registers from flip flops and decoders, and stacks from SRAM and register (if not implemented as cascaded shift register). From there, it needs moving stack tops between two stacks and from/to a temporary holding register (to avoid, at this point, swap), and voila, there's your ROT :)

@monsonite
Copy link

IMHO 7 primitives is a bit too few - and this is confirmed in the 708 times slower performance.

Both C.H. Ting and C.H. Moore settled on "about" 32 primitives as the minimum sub-set to allow efficient synthesis and execution of Forth.

Needless to say, you can do a lot with fewer instructions.

The PDP-8 was a perfectly viable machine capable of running 4K FOCAL and BASIC with just TAD, AND, DCA, ISZ, JMP, JMS plus the modify accumulator instructions (Clear, Complement, Shift left, shift right, set and clear carry etc) that were made available through the OPR class of instruction. I think a minimum of 16 and a maximum of 32 primitives would be needed depending on the machine architecture.

@kt97679
Copy link
Author

kt97679 commented Jan 15, 2021

@monsonite yes, I mentioned in the very beginning that there is not much practical use in the reducing of the number of primitives. This is more of the theoretical question what is most orthogonal set of words that can be used to build forth and can't be reduced.

@jacereda
Copy link

If it's a theoretical question you might be interested in https://en.wikipedia.org/wiki/One-instruction_set_computer

@kt97679
Copy link
Author

kt97679 commented Jan 15, 2021

@jacereda yes, I'm aware of the OISC, but I'm specifically interested in forth primitives that can be used to define the rest of the system. It is absolutely possible to implement forth in OISC, but in this case OISC single command will be used as an assembler, not as a forth primitive. And we will need to implement stack and everything else.

@Bushmills
Copy link

given that the logic of a NAND gate is enough to derive all other gates from it, and all kinds of gate logic in turn enough to model all primitives Forth would ever want to use, it seems that AND and 0= should all be needed, in addition to those words needed to glue them together such that new words can be written and executed - a NAND primitive alone shouldn't be enough to bootstrap a full system from it, as the NAND doesn't know about how to "connect" them, that is, how to build new words from it. It may also be a bit hefty to, say, having to implement a stack from NANDs alone. But as the question was about "needed", I suppose that required effort to implement a minimal system can be disregarded. Or is practicality also an aspect?

@alexdowad
Copy link

given that the logic of a NAND gate is enough to derive all other gates from it, and all kinds of gate logic in turn enough to model all primitives Forth would ever want to use, it seems that AND and 0= should all be needed...

Digital logic doesn't just require gates, but also something like flip-flops; some kind of memory.

@sturem
Copy link

sturem commented Jan 19, 2022 via email

@Bushmills
Copy link

And the same with memory: have enough flip-flops (from NANDs) and decoders (from NANDs), and you can build any amount of RAM (from just NANDs)

@GarthWilson
Copy link

When I try responding by email, it doesn't work; so I'm trying again in the github page:

True; but then you also need to be able to tri-state I/O bits. The only way you can do that with a NAND is if you add a separate output-enable input, or if the output is the wire-OR kind of thing which is a power hog and slows things down.

On 1/18/22 11:02 PM, Stuart wrote:

Ah, but with (several) NAND gates you can make a flipflop...

@Bushmills
Copy link

You don't need to tri-state outputs. You can also connect them as long as not both states are driven, as with open collector outputs. Those connect to a common usually high potential bar, pulled high by a pull-up, and any switched gate will pull it low. Or any number of active gates, doesn't matter. That's called a "wired or" and an alternative to tri-state when wanting to connect multiple outputs together.

@GarthWilson
Copy link

Right. As I said. That makes it a slow power hog though.

@Bushmills
Copy link

Would a minimal set of primitives, build from NANDs, really care much about power? But there may arise another problem when trying to build them from NANDs only which I didn't consider in my first post: the problem of lack of concurrency when "connecting" them in software.

@SirWumpus
Copy link

SirWumpus commented Jan 19, 2022 via email

@scherrey
Copy link

scherrey commented Jan 19, 2022 via email

@niclash
Copy link
Member

niclash commented Jan 19, 2022 via email

@monsonite
Copy link

You could look at sectorForth, implemented in under 512 bytes, to fit in the boot sector, in x86 assembly language

https://github.com/cesarblum/sectorforth

It was inspired by Bernd Paysan's post from 1996, using 8 primitives plus KEY and EMIT.

Have a look at the "Hello World" example to see how the language is brought up from scratch.

https://github.com/cesarblum/sectorforth/blob/master/examples/01-helloworld.f

@kt97679
Copy link
Author

kt97679 commented Jan 19, 2022 via email

@jwoehr
Copy link

jwoehr commented Jan 19, 2022 via email

@kt97679
Copy link
Author

kt97679 commented Jan 19, 2022 via email

@jwoehr
Copy link

jwoehr commented Jan 19, 2022 via email

@kt97679
Copy link
Author

kt97679 commented Jan 19, 2022 via email

@jwoehr
Copy link

jwoehr commented Jan 19, 2022 via email

@paraplegic
Copy link

paraplegic commented Jan 19, 2022 via email

@kt97679
Copy link
Author

kt97679 commented Jan 19, 2022 via email

@iru-
Copy link

iru- commented Jan 19, 2022

Hi folks, I would like to clarify that my original research was about discovering an orthogonal minimal set of words sufficient for building the whole forth system. Alan Kay once described fundamental parts of lisp as "Maxwell equations of the software https://www.gnu.org/software/mes/manual/html_node/LISP-as-Maxwell_0027s-Equations-of-Software.html" and I was curious what a forth analog. This was not about building the smallest forth in the world or using cpu with the smallest set of commands. Thanks, Kirill.

Hi Kiril,

You are welcome to read my (code) take on this matter in https://github.com/iru-/nopforth, which is a Forth dialect I've been writing for my own use for the past 4 years. It is definitely not a try at reaching a minimum set of primitives, but may provide another data point.

By the way, porting it to arm64 has organically reduced some of the "primitives" compared to the x64-64 code.

Kind regards,
Iruatã

@iru-
Copy link

iru- commented Jan 19, 2022

Embedded, bare metal, perhaps. Modern operating systems expend a fair bit of effort to ensure that the text (program) memory segments cannot be modified at runtime.

They sure do!
Check the dance I have to do to compile arm64 instructions on macOS on Apple Silicon at runtime: https://github.com/iru-/nopforth/blob/main/src/arm64/boot.s#L612

@RGD2
Copy link

RGD2 commented Jan 20, 2022 via email

@Bushmills
Copy link

It might also be necessary to describe better what a "low level word" is, as its definition has a massive impact on the number of "low level words" needed. Must they be Forth primitives, as required by any standard of choice? Can they be created for the sole purpose of providing tools for building a Forth? Do they have to correspond with or can be built from executable instructions of a host CPU? Different answers to these questions will result in a different set, in both composition as well as in size, of the "minimal set".
I assume that those low level "words" will used to be combined with each other, in such a way that they can be, say, executed in sequence, to form new, more complex behaviour. If those words from the minimal set don't need to be existing Forth primitives, there's a simple way to reduce the number of needed words to one already: I implement a low level word, which I call "dispatch". It receives an argument, and depending on its value, branches to one or another mode of behaviour coded into it. Say, calling it with 1 results in code which duplicates top of stack, 2 cause it to behave such a way that top of stack is discarded, 3. makes it swap two top stack items, and so on. Everything in one single low level word. Therefore, a complete Forth can be built from just one single low level word.
"But that's cheating" you might say - in that case, all solutions requiring some external processing, only interfaced by a small set of low level words to reference them, Like, memory architectures which yield results by appropriate addressing, But then, isn't using CPU instructions not also a way of using capabilities, external to the Forth we want to code?
So let's reduce this further, and only require that we can combine out lowest level code expression units. Evidently we are allowed to use CPU instructions. Customising the CPU is of course one route, but not even necessary: any CPU allows us to process all aspects if its capabilities by merely putting zeroes and ones into a suitable sequence. So we combine the minimal set
of "1" and "0" to obtain more complex behaviour. Say, we arrange them such that the effect, when executing a thusly built instruction, is to discard the top item of a stack. Or another, causing it to get duplicate. Therefore, all we need are zero instructions - we can, after all, build any by simply putting zeroes and ones into the proper sequence. "But that's cheating again" one may say - no, it isn't, as long as the "rules" defining what those "low level words" are don't exclude such an approach, and currently, the - nonexistant - definition of those don't.

@monsonite
Copy link

monsonite commented Jan 20, 2022 via email

@Bushmills
Copy link

@monsonite, then you are the living proof that a single-primitive Forth (-alike) is not just a hypothetical or academical exercise :-)

@kt97679
Copy link
Author

kt97679 commented Jan 22, 2022

Maybe my question should be transformed to "what is absolutely minimal amount of the assembly code we need to write to implement ANS forth on the 64 bit linux system"?

@iru-
Copy link

iru- commented Jan 22, 2022 via email

@iru-
Copy link

iru- commented Jan 22, 2022 via email

@Bushmills
Copy link

CPUs capable of running 64 bit Linux may have loadable microcode (Intel, AMD) or exist as, say, Verilog source which, after compiling and synthesizing, can be loaded into an FPGA (such as RISC-V) - will any "cheats" employing customization of any of those be acceptable? Like, adding an equivalent of such a "dispatch" instruction?

@RGD2
Copy link

RGD2 commented Jan 22, 2022 via email

@kt97679
Copy link
Author

kt97679 commented Jan 22, 2022

@iru- I looked at Linux.s but it is just aggregator of other files so I assume you mean arch.ns ? Yes, your system looks pretty minimal but probably it can be minimized even further? Since you are solving some practical tasks this may not be reasonable in your case since system will become much slower.

@Bushmills let's not use any cheats.

@RGD2 I just tried to narrow down requirements. We definitely can use any other OS and any other architecture (or like you noted we can use wasm). I looked into both sectorforth and sectorlisp and those are amazing projects but they are about minimizing the whole code of the system. I'm not trying to do that. I try to see how much of the system we need to define in the low level code so the rest of the system can be defined via that low level base. From my research I see that we can build complete forth system with only 9 words implemented in the low level assembly. I was curious if this is the limit or we can reduce number of those words further.

@iru-
Copy link

iru- commented Jan 23, 2022

@iru- I looked at Linux.s but it is just aggregator of other files so I assume you mean arch.ns ? Yes, your system looks pretty minimal but probably it can be minimized even further?

Yes, Linux.s is an aggregator. The real work is done by:

  • x86_64/boot.s: the main routine and the bulk of what's needed in assembly to start interpreting/compiling nop code. This is mostly independent of OS, but does assume some POSIX interfaces.
  • x86_64/sysv.s: utility routines to aid in interfacing with the Linux/*BSD environment around nop, such as command-line arguments and loading dynamic libraries.
  • dicts.s: assembled-in dictionary

You mentioned x86_64/arch.ns. That's the only architecture-dependent source in nop, not in assembly, itself. Everything after it included in Linux.s is portable.

As you correctly guessed, the assembly code can be minimized further. For fun, I've played and was successful with not interpreting numbers in assembly at all, leaving that to routines in nop. The end result was a little less clear than the current code, so I didn't commit it to the main branch. Also, I'm currently porting nop to arm64 and it turned out that some assembly ended up being implemented in arm64's arch.ns just because it didn't need to be in assembly.

Since you are solving some practical tasks this may not be reasonable in your case since system will become much slower.

I write and use nop for practical tasks (and having fun!). However, I never cared specifically about it being much slower or faster than it is. I just try to write nop itself in a way that both it and the programs written in it are not too complex.

As an example, initially I tried to use syscalls directly for my interaction with the OS. It turns out that unix-like operating systems don't have a very strict definition of who should implement the expected behavior of a call, IOW this behavior is usually accomplished by a combination of the syscall itself and its counterpart in libc. To avoid having to deal with this problem altogether, I stopped dealing with syscalls directly and started using libc.

Regarding performance, it doesn't seem like nop is too bad. Whereas cat(1)-ing its README on Linux takes around 0.002s, using nop's examples/cat.ns on the same file takes around 0.005s, but the latter involves bootstraping nop -- which compiles the whole nucleus on the fly --, compiling cat.ns and running it.

@ruv ruv added question implementation How to implement a Forth system labels Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation How to implement a Forth system question
Projects
None yet
Development

No branches or pull requests