-
Notifications
You must be signed in to change notification settings - Fork 5
The S Expression Language
The S-Expression language in Atomic Database is meant to be a predictable, advanced, and complete interface for all of Atomic Database's capabilities, both simple and complex. Its meant for creating advanced rules, complicated queries, and so on, but it is also both capable and very succinct for basic queries as well, meaning that once you become comfortable with using it, it can be more productive to use it for even simple things. Personally, I still use the NL query system for basic queries, simply because it requires less special characters to be typed. YMMV.
Before we move on, I'm assuming minimal to no prior programming experience outside of Atomic Database, but at least some experience with the natural language query system and the GUI, enough to get a feel for how AD uses data and evaluates queries. I'm going to be using some possibly unfamiliar words going forward, so here are some definitions:
- DSL: A Domain Specific Language. This is a programming language designed to do one specific thing. It might be good at other things, but usually not.
- Programming Language: a sort of formal notation, like math for instance, that you use to communicate with a computer.
- General-Purpose Programming Language: Also know as a Turing-complete programming language, this is a sort of programming language that could be used to do any sort of task you want (assuming you had the right applications and libraries). A general purpose programming language can represent (or "do") anything any other programming language can "do".
-
Unification: When you do unification, you combine two lists of "things". These "things" usually are one of a few things: numbers, names, strings, lists, or variables. Variables have some special behavior that I'll talk about in a bit. When you unify "things" which are not variables, they only "succeed" in unifying if they're the same thing. So
1
unifies with1
and nothing else, and so on. If I were to try and unify1
,2
, and3
with1
,2
, and4
, the entire unification would fail, because one of the terms ("things") doesn't unify with its counterpart. Variables are special, however. If you unify a "free" variable with something that isn't a variable, the variable becomes "bound" (the opposite of "free") and that variable becomes equal to the value it was unified with. If you unify a "bound" variable, the unificator looks up the value of that variable (from a previous unification, usually), and checks to see if the value of that variable unifies with the variable's counterpart in the unification expression. You'll see some examples of how this works later on, in the Bare Unification section. - Backtracking: Backtracking is how Atomic Database tries to "solve" the problems you give it. When you give it queries, it looks at them like logic problems, where it tries to find all (or at least most of) the solutions that fit all of the limits and explanations you add to the query. How this works is, when you ask it to bind a value to a variable, it chooses the first thing that works. Then, it goes through the rest of your query choosing the first thing that works. If it gets to the end and it has a valid answer, it returns that to you! If you're not happy with that answer or want more, you tell it to find you the next answer. When you do that, it backtracks, starting at the end of the executed program and going backward until it finds a statement (or form or unification, or predicate) that has other solutions that might work. When it finds one, it chooses the next solution, and from there it plays the program forwards using that new solution and returns the answer to you. If you keep asking for the next solution, it keeps going back and finding other valid solutions, and when it runs out of valid solutions at each place that had multiple solutions, it goes back even further, to the next spot that had multiple possible solutions and tries another one there, and so on. On and on it goes, until its exhausted the entire tree of possible solutions.
Although Atomic Database's S-Expression Language (SEL) features S-Expressions (obviously), a syntax almost entirely unique to Lisp- and Scheme- derivatives, it is important to understand that AD's SEL is not a Lisp. The order of semantic elements in forms is completely different, as is the execution and evaluation model, and it does not (at the moment) have functions capable taking other functions as arguments, or even what other languages would call functions at all. Instead, the S-Expression language might best be thought of as a DSL for querying and manipulating the Atomic Database's unification and logic engine. It is not intended to be a general purpose programming language, and it is not Turing complete, although with the addition of lexical scoping, immediate functions and recursion (all of which are on the README) that could change. The basic syntax for an SEL expression is as follows:
(<1> <rule/attribute name> <2> <3> ... <n>)
(See sidebar for information on how to read code examples, if you're confused).
It is important to understand that the rule or attribute name must always be the second element in a form, since it does not make sense to call a rule or attribute without any method of output, nor any input variables to do calculations on-- this is because at the moment (and in all likelihood, in the future too) rules and attributes are "pure", they don't have side-effects. So, if the thing in the second position isn't a valid attribute or rule name, Atomic Database will show you an error. Furthermore, it does not make sense to call an attribute without the respective entity and value (even if one or more of them are variables), so Atomic Database will show you an error if you do that. It is OK, however, to call a rule with just one argument, so Atomic Database will first check if the thing in the second position is an attribute or a rule, if its an attribute, it will look for 2 total arguments, and if its a rule, one or more is just fine.
As an example, if get_foo
is a rule (pretending it has whatever number of arguments we want in the examples below) and name
is an attribute, here are some examples (by no means is the list of invalid, or valid, forms exhaustive):
Valid | Not Valid |
---|---|
(E name V) |
(E name) or (name V) or (E V)
|
("entity" name V) |
("entity") or ("entity" name)
|
(X get_foo) |
(get_foo) |
(X get_foo Y Z A B C D) |
(get_foo X Y Z A B C D) |
Just like the natural language query system has conjugations, so does SEL. It has three conjugations at its disposal, each with very specific behaviors:
Name | Syntax Format | Behavior |
---|---|---|
And | (& ...) |
Evaluates the forms inside it (any number of them) sequentially. Carries over bindings made from one form to the next, and treats solving them as a unit (ie. it backtracks to solve all the constraints in the and conjugation, not just each constrains individually. |
Or | (| ...) |
Evaluates the forms inside it (any number of them) in parallel (not actually at the same time, yet, but in theory). Bindings made in each form do not carry over to the next one, and they are each solved individually, with all valid answers from any of the form's output. |
Cond | (? ...) |
Tries to solve each form individually, like Or , but stops at the first successful value each time. If you keep clicking next, it will still not go to the next branch afterward.1 Essentially works like a chain of if's or Lisp's cond`. |
To allow for assigning arbitrary values to variables and doing rudimentary pattern matching, SEL has exposed the internal =
function as unify
in code. Before I cover how to actually use =
, first I have to cover what variables look like, since variables feature prominantly in unification (see above).
Variables in SEL are symbols that:
- are not surrounded in quotes
- do not have spaces (are space delimited)
- begin with a capital letter or a symbol
These variables are the same "variables" mentioned in the definition of unification at the beginning of this document.
Variables that begin with a *
are reserved as database-wide constants (this will be covered later) and ...
and @
are reserved for list destructuring patterns (also covered later).
The bare unification syntax is very simple but fairly powerful. Do note that it is in the conjugation form instead of the database lookup and rule calling form. This is actually something I intend to fix later, as unification is a holdout from an alternate design path. Some examples of bare unification syntax are:
To set O
to one:
(= O 1)
Does the same thing as above, but confirms that the values around it are the same (note that this operates on lists!):
(= [1 2 A "b"] [1 2 "a" "b"])
Use a string:
(= Str "this is a string")
And so on. It's not particularly powerful at the moment since SEL lacks a way to destructure lists (which would allow iteration) but it's useful for unpacking and assignment currently.
At the moment, all numbers are interpreted as floating point numbers unless specifically input as integers from the GUI. All numbers inside SEL are considered the floating point.
Expressions can be carried out on numbers (and variables with a number value) in SEL using the { ... }
syntax:
(= O { 4 / 2 }) #--> sets O to 2.0
(= F { O * 2 }) #--> sets F to 4.0
Note that this is a specially sandboxed version of Python's eval
command behind the scenes, which only allows arithmetic expressions, but any valid arithmetic expression in Python is valid in SEL { ... }
syntax. Note that putting print("foo")
and such things inside an { ... }
block will not work. An expr
block is capable of using variables from SEL. You can have multiple expr
"blocks" in a single line, or even a single form if you want to.
In SEL, if a rule fails it clears current bindings and triggers backtracking if any can be done. A final failure result is seen as a No result
in the GUI. This happens if, as an example, you pass in all three values to an attribute query and one of those values is wrong. If bindings are unchanged, that means success, just that no new bindings had to be created. This happens when, for example, you supply all three values to an attribute query, and those values are correct. This is seen in the GUI as green text stating as much.
Comparisons in SEL act based on this method. So a comparison such as (<= X 3)
will leave the current bindings unchanged if the comparison is true, or destroy the bindings (essentially returning None
or No
) and trigger backtracking (again, if possible). Supported comparison operators are <
, >
, <=
, and >=
. Note that ==
is not supplied because this can be trivially achieved using backtracking. Also note that this is not a regular rule or predicate, and as such will not perform backtracking itself. Thus, if you pass in an unbound variable, it will not give you all the numbers that satisfy that comparison. This is possible are for improvement for SEL.
An example is:
("some_entity" age X)
(>= X 44)
This would only return a non-fail result if "some_entity"'s age was greater than 44. Note that the first and second arguments can both be either values or variables.
You could also do something like this:
(X age A)
(>= A 44)
Which would find all entities with an age greater than 44
.
Putting it all together, we can construct something like this:
(? (& (P age X) (<= X 55) (unify (O) ("youngish")))
(& (P age X) (<= X 70) (unify (O) ("pretty old")))
(unify (O) ("old")))
This example takes whatever entity P
is set to, and sets O and X to the requisite values. Note at this time that if you put this in the query box without specifying P
, it will only show you the value of P
that satisfies that first form. Note that if it is written with the (P age X)
outside of the cond
statement, it is able to detect and perform backtracking correctly:
(P age X)
(? (& (<= X 55) (unify (O) ("youngish")))
(& (<= X 70) (unify (O) ("pretty old")))
(unify (O) ("old")))
Rules are defined in the "Rules Editor", which you can show from the Window
menu of Atomic Database. There, you can click the new button to create a new rule, which you can name (the name will be automatically fixed of common mistakes). Each rule is a dropdown item, so if you click on it, it will toggle visibility of the rule's internals. There are 3 sections here. The first allows you to add or remove arguments and select the language you wish to write rules with. The second contains the textbox for your code in that rule, and a button labeled "Save Code". This button evaluates the text in the textbox, parsing it and turning it into an AST which is then stored to the database. Your rules will not be valuable or accessible (or won't be updated if you are modifying a rule) until you click the "Save Code" button.
Rules can call themselves, and operate using lexical scope. In the future, database-wide constants will be made available as variables with the *
prefix so that rules can have context beyond what arguments are passed in and what they output.
Lists are represented as space-delimited values in between square brackets ([ ... ]
). Some examples of lists are:
[1 2 3]
[1 "a" "b"]
["a" "b" "c"]
[A "b" "c"]
Note that if variables are referenced inside a list, they are evaluated to their corresponding value when the list is evaluated, not when it is used. Furthermore, undefined variables are treated as symbols, just like regular lowercase symbols, instead of being an instant non-match, allowing for lists like:
[This is a list!]
Also note that lists can contain expressions, like so:
[{1 + 1} {2 * 2} {3 - 3}]
Currently, SEL supports two different kinds of list destructuring, plus the regular method of unification. In SEL, if a list is unified with another list, the unification iterates over both lists and unifies their elements. Only if all unifications are successful is the unification of both lists successful. If it is successful, the relevant bindings are returned. This is demonstrated in earlier sections.
The important thing to note, and what this section is about, is when a list contains the symbols ...
or @
. If one of the lists (but not both) contains these symbols, that list is treated as a destructuring pattern. A destructuring pattern behaves like a unification, in that it binds variables, unifies values, unifies values with the values of variables, only returns bindings if all unifications in the expression are succesful. However, instead of performing an elementwise unification scheme, it does something a little different.
The @
symbol's syntax looks like this:
(= [List @ ...pats...] ["a" "b" "c"])
Essentially, @
denotes, and belongs to, a list destructuring pattern. It must be preceded by the name of a variable to bind the entirity of the list the pattern is being matched to. It is then succeeded by the rest of the pattern it is inside, be it a regular unification list pattern (like in the Bare Unification section) with variables and values, or other types of destructuring descriptors. Note that there can only be one @
descriptor in a given pattern!
Here's an example:
(= [List @ A B C] ["a" "b" "c"])
In this example, the variable A
is bound to "a"
, and so on, but the variable List
is bound to ["a" "b" "c"]
.
The ...
descriptor can be used at the end of a list destructuring pattern, following any unification list patterns and @
descriptors, and expects to be followed by a variable name to bind the rest of the list to. Essentially, if there are any list elements left, the variable following the ...
will be bound to them. To elaborate, here is what it would look like:
(= [A B ... T] ["a" "b" "c" "d"])
In this example, A
is bound to "a"
, and B
is bound to "b"
, but T
is bound to a list of the leftovers of the list, namely ["c" "d"]
. This also works, of course:
(= [A "b" ... T] ["a" "b" "c" "d"])
Note that all descriptors can be combined together, as long as they are given what they expect. So this:
(= [List @ A B ... T] ["a" "b" "c" "d"])
works as expected.
Under construction.
Constants, as mentioned at the start, are referenced as variables beginning with a *
. So for instance, *foo
is a database-wide constant. Constants, unlike variables, are accessable from everywhere inside SEL, the NL engine, and rules. These are useful for things that everything needs to know about that don't fit neatly into the database, such as, for instance, the value of pi
. Constants are set in a special graphical window, the Const View
, and can only be accessed from inside any code.
A note on quoting. Strings are quoted with double quotes, like in Python or JavaScript. However, as in the Natural Language Query System, entity names should also be quoted! Although entity names are not necessarily strings, per se, they should be quoted to help the interpreter recognize them correctly. For entity names that would be a valid attribute or rule name, going without quotes should be fine. But for entity names that include special characters (beyond underscores), or entity names that would look like a variable name or include spaces please use quotes.
To read a code example, there are three things you need to know:
- Anything between angle brackets (
<
and>
) is a placeholder. A placeholder means you can put a valid value in its place. Generally, the words between the angle brackets tell you what sort of thing goes there. - Numbers between angle brackets mean that the thing the bracketed item stands in for is the
n
th argument. So<1>
means the 1st argument, and so on. - Ellipses (
...
) mean that you can put any number of things there. Sometimes, these are left out where the text explicitly says that multiple things can be put there.