Skip to content

Steps moving forward

Jack Lloyd-Walters edited this page Aug 15, 2021 · 12 revisions

Lay out the groundwork for what needs to be done:

An open issue for this topic is also found here

And a project board for the syntax can be found here

And the entire list of syntax can be found here

Steps moving forward

Decide on a name, and file extension

We have python and .py, C and .c, Javascript and .js.

What does this language have?

Maybe Verboscript with the .vrbo extension?

Verbose, Verbose, Verbose

The entire concept for this language is plain english representation. Anyone, irrespective of programming experience, should be able to read and understand exactly what is happening. For an example of how this can be useful, consider the following:

This example in Assembly:

        extern _printf
        global _main

        section .text
_main:
        mov  iter, 0
        mov  maxit, 5
loop1   nop
        push iter
        push format
        call _printf
        pop  iter
        inc  iter
        cmp  iter, maxit
        jl   loop1

format: db   '%d', 10, 0

Is roughly equivalent to this example in C:

#include <stdio.h>
for(int x = 0; x < 5; x++)
{
    printf(x)
}

Which is equivalent to this example in Python:

for x in range(0, 5):
    print(x)

Which could potentially be this, in Verboscript:

start a counter at zero, then repeat the following five times:
    show the counter

Each layer of abstraction becomes easier to understand in plain english, at the cost of requiring a more complicated program to execute it (Assembly is executed directly by the CPU, while Python needs a C compiler, which itself needs an assembler, before it can be execute)

Decide on the syntax, and any included features, ie:

How does this language represent variables, loops, functions?

Does this language try to understand spelling mistakes, like in english?

Come up with a whole host of useful examples, and possibly their equivalent in python.

More technical details

Is this language Static or dynamic?

  • Static: Variables in this language can contain only a single type of data (integer, string, list, etc) that is set when the variable is defined

  • Dynamic: Variables can contain any type of data, and can be chopped and changed throughout.

  • Pros/Cons: Dynamic is more intuitive and potentially simpler to program, but static reduces the errors with type checking at runtime

Interpreted or compiled?

  • Interpreted: The language is interpreted and executed line by line with a script written in another language

  • Compiled: The language is translated directly into machine code before execution, which can then be ran directly.

  • Transpiled: Another option, where a language is translated into another language, and the translated file is executed.

  • Pros/Cons: Compiled is faster, but interpreted is much simpler

  • Descision: The language will be interpreted

Bytecode or Tree-Walk?

  • Bytecode: The language is read, and converted into a linear series of small instructions that can be executed very efficiently

  • Tree-Walk: The language is divided into a search tree, where each branch dictates the subset of instructions related to the previous

  • Pros/Cons: Tree-Walk is easier to program, but Bytecode is much faster and requires much less memory

  • Descision: The language will utilise Bytecode

Object oriented, Procedural, or Functional? see this page for a whole sleuth of paradigms

  • Object oriented: The language primarily uses and modifies objects (like classes), ie: Python

  • Procedural: The languages primarily uses procedural calls, writen in exactly the order that the computer should execute them, ie: BASIC

  • Functional: The language relies on function calls to modify data, ie: Clojure

  • Pros/Cons: Honestly, they're all much for muchness. I personally prefer blended OOP and Func (Object oriented Programming and functional).

How extensive are the data types?

  • Do we store distinguish between strings and numbers, or are they all just 'values'?

  • What about between integers and decimals, or are they all just numbers?

  • Perhaps another distinction? (maybe all numbers could be complex, and we distinguish by real/imaginary who knows?)

and any other technical considerations I've missed

Build stuff

Build the python scripts that will make this language work

This depends on the technical details above, but should loosley follow this order:

  1. A Lexer (or tokeniser) that converts a plaintext file into tokens that represent language syntax.
    • ie: x = 5; print(x + 4) becomes [IDENTIFIER:x, EQUAL, NUMBER:5, IDENTIFIER:print, LEFTBRACE, IDENTIFIER:x, PLUS, NUMBER:4, RIGHTBRACE]
    • Tokenised script is much easier to work with, compared to a plaintext file.
  2. A Parser that takes the tokens, and converts them into an intermediate representation (Abstract syntax, or bytecode) that is easier to execute.
    • ie: [IDENTIFIER:x, EQUAL, NUMBER:5, IDENTIFIER:print, LEFTBRACE, IDENTIFIER:x, PLUS, NUMBER:4, RIGHTBRACE] becomes:
      • [variable_declaration("x", 5), function_call("print", operation("add", variable_fetch("x"), 4))] in a tree walk environment.
      • [Declare, x, 5, Function, Print, Add, x, 4] in a bytecode environment
  3. A compiler that takes the intermediate representation, and checks all variables and the like, to ensure the code can run.
  4. An interpreter that executes the intermediate representation.
    • Note: If the language is interpreted, steps 3 and 4 are done at the same time, by the same script.

And a testing suite that compares equivalent python scripts to this language, so we can ensure behaviour is correct. ie: if this language uses show hello world to print to the screen, then it should give the same result as python's print("hello world")