Compis is a contemporary systems programming language in the spirit of C.
The compiler supports writing programs in mixed Compis and C; you can mix .c and .co source files. It accomplishes this by bundling some LLVM tools, like clang and lld into one executable. The LLVM dependency might go away someday.
"Compis" is a variation on the Swedish word "kompis", meaning "buddy" or "comrade." It's also a nod to one of the first computers I used as a kid.
Note: Compis is under development
Usage:
$ compis build -o foo main.co
$ ./foo
Since Compis bundles clang & lld, you can use it as a C and C++ compiler with cross-compilation capabilities:
$ compis cc hello.c -o hello
$ ./hello
Hello world
$ compis cc --target=aarch64-macos hello.c -o hello-mac
$ file hello-mac
hello-mac: Mach-O 64-bit executable arm64
$ compis c++ --target=wasm32-wasi hello.cc -o hello.wasm
$ wasmtime hello.wasm
Hello world
- Memory safety via ownership
let x *Thing = thing // move, not copy
- Simple: primitive values, arrays, structs and functions.
- Convention over configuration: Sensible defaults. One way to do something.
- Compiler is a statically-linked, single executable.
- Hackable pragmatic compiler (parse → type-check → static analysis → codegen → link)
Compis attempts to make local reasoning about code as easy as possible. There are no implicit type conversion, no side effects from language constructs and the lifetime constraints of a value is explicitly encoded in its type.
Here's a parts list:
- booleans
bool
(true, false
) - machine-sized integers
int, uint
(-123, 0xbeef, 0b1111011
) - size-specific integers
i8, u8, i16, u16, i32, u32, i64, u64
- size-specific floating-point numbers
f32, f64
(1.2
,1.e39
) - fixed-size arrays
[i32 3]
(let a = [1, 2, 3]
) - runtime-sized arrays
[i32]
(let a [i32] = [1, 2, 3]
) - unicode character literals
'A' == 0x0041, '↔' == 0x2194, '🆎' == 0x1F18E
(u32
) - ASCII byte-sized character literals
u8('A')
- structure types
struct MyThing { field1, field2 type; field3 type; }
- functions
fun name(x, y int, z str) int
(let s = name(1, 2, "three")
) - anonymous functions
fun (x, y t1, z t2) result
- type functions
fun MyThing.foo(this, x, y int) int
(var x MyThing; x.foo(1,2)
) - "optional" types
var x MyThing?; if x { typeof(x) == MyThing }
- variables
var x, y, z int; var a = 123; a = 456
- read-only bindings
let a = [1, 2, 3]
(values are mutablea[1] = 22
)
Compis manages the lifetime of values which are considered "owners." In other words, memory allocation and deallocation is automatically managed. There's an escape hatch ("unsafe" blocks & functions; unimplemented) for when you need precise control.
A type is either considered "copyable" or "owning":
-
Copyable types can be trivially copied without side effects. For example, all primitive types, like
int
andbool
, are copyable. The lifetime of a copyable value is simply the lifetime of its owning variable, parameter or expression. -
Owning types have managed, linear lifetimes. They can only be copied—as in diverge, or "fork" if you will—by explicit user code. When a value of owning type ceases to exist it is "dropped"; any
drop()
function defined for the type is called and if the type is stored in heap memory, it is freed.
Assigning a copyable value to variable (or using it as an rvalue expression in any situation, like passing it as an argument to a call) creates a distinct "deep" copy:
type Vec2 { x, y int }
type Line { start, end Vec2 }
var Line a
var b = a // 'a' is copied
b.start.x = 2
assert(a.start.x == 0)
Assigning an owning value to a variable moves the value; its previous owner becomes inaccessible and any attempt to use the old owner causes a compile-time error.
type Vec2 { x, y int }
type Line { start, end Vec2 }
// implementing "drop" for Vec2 makes is an "owning" type
fun Vec2.drop(mut this) {}
var Line a
var b = a // 'a's value moves to 'b'
b.start.x = 2
a.start.x // error: 'a' has moved
References are used for "lending" a value somewhere, without a change in storage or ownership.
Reference types are defined with a leading ampersand &T
and created with the &
prefix operation: &expr
.
type Vec2 { x, y int }
fun rightmost(a, b &Vec2) int {
// 'a' and 'b' are read-only references here
if a.x >= b.x { a.x } else { b.x }
}
fun main() {
var a = Vec2(1, 1)
var b = Vec2(2, 2)
rightmost(&a, &b) // => 2
}
Mutable reference types are denoted with the keyword "mut": mut&T
.
Mutable references are useful when you want to allow a function to modify a value without copying the value or transferring its ownership back and forth.
type Vec2 { x, y int }
fun translate_x(v mut&Vec2, int delta) {
v.x += delta
}
fun main() {
var a = Vec2(1, 1)
translate_x(&a, 2) // lends a mutable reference to callee
assert(a.x == 3)
}
Compis does not enforce exclusive mutable borrowing, like for example Rust does. This makes Compis a little more forgiving and flexible at the expense of aliasing; it is possible to have multiple pointers to a value which may change at any time:
type Vec2 { x, y int }
type Line { start, end mut&Vec2 }
fun main() {
var a = Vec2(1, 1)
var line = Line(&a, &a)
a.x = 2
assert(line.start.x == 2)
assert(line.end.x == 2) // same value
}
This may change and Compis may introduce "borrow checking" or some version of it, that enforces that no aliasing can occur when dealing with references. Mutable Value Semantics is another interesting idea on this topic.
References are semantically more similar to values than pointers: a reference used as an rvalue does not need to be "dereferenced" (but pointers do.)
fun add(x int, y &int) int {
let result = x + y // no need to deref '*y' here
assert(result == 2)
return result
}
The only situations where a reference needs to be "dereferenced" is when replacing a mutable reference to a copyable value with a new value:
var a = 1
var b mut&int = &a
*b = 2 // must explicitly deref mutable refs in assignment
assert(a == 2)
A pointer is an address to a value stored in long-term memory, "on the heap".
It's written as *T
.
Pointers are "owned" types and have the same semantics as regular "owned" values, meaning their lifetime is managed by the compiler.
Pointers can never be "null". A pointer value which may or may not hold a value is made optional, i.e. ?*T
.
Planned feature: unmanaged "raw" pointer
rawptr T
which can only be created or dereferenced inside "unsafe" blocks.
type Vec2 { x, y int }
fun translate_x(v mut&Vec2, int delta) {
v.x += delta
}
fun example(v *Vec2) {
v.x = 1 // pointer types are mutable
translate_x(&a, 2) // lends a mutable reference to callee
assert(v.x == 1)
// v's memory is freed here
}
In this example, Thing
is considered an "owning" type since it has a "drop" function defined. Compis will, at compile time, make sure that there's exactly one owner of a "Thing" (that it is not copied.)
type Thing {
x i32
}
fun Thing.drop(mut this) {
print("Thing dropped")
}
fun example(thing Thing) i32 {
return thing.x
} // "Thing dropped"
When the scope of an owning value ends that value is "dropped":
- If the type is optional and empty, do nothing, else
- If there's a "drop" type function defined for the type of value, that function is called to do any custom cleanup like closing a file descriptor.
- If the value has subvalues that are owners, like a struct field, those are dropped.
- If the value is heap-allocated, its memory is freed.
Compis has optional types ?T
rather than nullable types.
fun example(x ?i32)
// let y i32 = x // error
if x
// type of x is "i32" here, not "?i32"
let y i32 = x // ok
The compiler is clever enough to know when an optional value is definitely available or not, based on prior tests:
fun example(name ?str)
if name && name.len() > 0
print(name)
fun example2(a, b ?int) int
if a && b { a * b } else 0
fun example3(a, b ?int) int
if !(a && b) 0 else { a * b }
fun example4(a, b ?int) int
if a && b { a * b } else
a * 2 // error: a may be valid, but must be checked first
fun example5(a ?int) int
if !a { a * 2 } // error: a is definitely empty
You can use variable definitions with if
expressions, which is useful for taking some action on the result of a function call, without spamming your scope with temporary variables:
fun try_read_file(path str) ?str
if let contents = try_read_file("/etc/passwd")
// in here, the type of contents is str, not ?str
print(contents)
In the above example, contents
is only defined inside the "then" branch.
Accessing contents
in an "else" branch or outside the "then" branch either
leads to an "undefined identifier" error or referencing some other variable in
the outer scope.
Memory in Compis is always initialized. When no initializer or initial value is provided, memory is zeroed. Therefore, all types in Compis are valid when their memory is all zero.
type Vec3 { x, y, z f32 }
var v Vec3 // initialized to {0.0, 0.0, 0.0}
Compis has variables var
and one-time bindings let
. let
bindings are not variable, they can not be reassigned once defined to. The type can be omitted if a value is provided (the type is inferred from the value.) The value can be omitted for var
if type is defined.
var b i32 = 1 // a 32-bit signed integer
var a = 1 // type inferred as "int"
a = 2 // update value of a to 2
var c u8 // zero initialized
let d i64 = 1 // a 64-bit signed integer
let e = 1 // type inferred as "int"
e = 2 // error: cannot assign to binding
let f i8 // error: missing value
FUTURE: introduce a
const
type for immutable compile-time constants
FUTURE: support deferred binding, e.g.
let x i8; x = 8
Amazingly, Compis has functions! Here are a few examples:
fun mul_or_div(x, y int, mul bool) int
if mul x * y
else x / y
fun no_args() int { return 42 }
fun no_args_implicit_return() int { 42 }
fun no_result(message str) { print(message) }
fun inferred_result(x, y f32) = x * y
let f = fun(x, y int) = x * y
Compis doesn't have classes, but it has something called "type functions" that allows you to clearly associate functions with data:
type Account
id uint
name str
fun Account.isRoot(this) bool
this.id == 0
fun Account.setName(mut this, newname str) str
let prevname = .name
.name = newname
prevname
fun example()
let account Account
account.setName("anne")
To access a C ABI function you need to declare it in the compis source file
that accesses it using pub "C" fun ...
:
example/a.c
:
long mul_or_div(long x, long y, _Bool mul) {
return mul ? x*y : x/y;
}
example/b.co
:
pub "C" fun mul_or_div(x, y int, mul bool) int
fun example() int
mul_or_div(1, 2, true)
An array in Compis is a representation of a region of linear memory, interpreted as zero or more values of a uniform type. Arrays have ownership of their values.
There are two kinds of arrays with very similar syntax:
- Fixed-size arrays
[type size]
(e.g.[int 3]
for three ints) - Dynamic arrays
[type]
which can grow at runtime
var three_ints [int 3]
var some_bytes [u8]
three_ints[3] // compile error: out of bounds
some_bytes[3] // runtime panic: out of bounds
Fixed-size arrays are concretely represented simply by a pointer to the
underlying memory while dynamic arrays are represented by a triple
(capacity, length, data_pointer)
.
Slices are references to arrays. Like regular references, they can be mutable.
var three_ints [int 3]
var a &[int] = &three_ints
var b mut&[int] = &three_ints
Slices are concretely represented by a tuple (length, data_pointer)
.
Note that you can make references to arrays while retaining compile-time size information:
References to fixed-size arrays are concretely represented by a pointer to the array's underlying memory, rather than a slice construct:
var four_bytes [u8 4] // fixed-size array
var some_bytes [u8] // dynamically-sized array
var a = &four_bytes // mut&[u8 3] -- reference to array
var b = &some_bytes // mut&[u8] -- slice of array
var c &[u8] = &four_bytes // slice of array, from explicit type
When you don't specify the type, as in the case for a
and b
above,
Compis will chose the highest-fidelity type. For example, referencing a fixed-size array
yields a mutable array reference with the length encoded in the type.
This leads the most efficient machine code (just a pointer.)
However, you can specify a more narrow type or a compatible type of lower fidelity, as
in the case with c
above.
To understand the practical impact, consider the two following functions, one which takes a reference to an array and one which takes a slice:
fun decode_u32le_array_ref(bytes &[u8 4]) u32
u32(bytes[0]) |
u32(bytes[1]) << 8 |
u32(bytes[2]) << 16 |
u32(bytes[3]) << 24
fun decode_u32le_slice(bytes &[u8]) u32
u32(bytes[0]) |
u32(bytes[1]) << 8 |
u32(bytes[2]) << 16 |
u32(bytes[3]) << 24
The equivalent generated C code looks like this:
u32 decode_u32le_array_ref(const u8* bytes) {
return (u32)bytes[0] |
(u32)bytes[1] << 8u |
(u32)bytes[2] << 16u |
(u32)bytes[3] << 24u;
}
u32 decode_u32le_slice(struct { uint len; const u8* ptr; } bytes) {
if (bytes.len <= 3) __co_panic_out_of_bounds(); // bounds check
return (u32)bytes.ptr[0] |
(u32)bytes.ptr[1] << 8u |
(u32)bytes.ptr[2] << 16u |
(u32)bytes.ptr[3] << 24u;
}
Notice how for the "slice" version, two machine words are passed as arguments (length and address) instead of just one (address) as is the case with the "array_ref" version. Additionally, a bounds check is performed at runtime in the "slice" version.
Note: Currently the actual generated code is a little less elegant for the "slice" version as there are bounds checks for every access, not just one for the largest offset. In the future the Compis code generator will be more clever and perform a minimum amount of runtime checks.
Structures are used for organizing data. A structure have named fields and can be nested. Here's an example of declaring a named structure type:
type Thing
x, y f32
size uint
metadata // anonymous struct
debugname str
is_hidden bool
They can be composed, and the order they are defined in does not matter:
type Thing
x, y f32
size uint
metadata Metadata
type OtherThing
name str
metadata Metadata
type Metadata
debugname str
is_hidden bool
Template types (generic types) are useful when describing shapes of data with
incomplete details.
They are also useful for reusing the same structures for many other types.
Here's an example where the same type Vector
is used to implement two functions
that accept different types of primitive values (int
and f32
):
type Vector<T>
x, y T
fun example1(point Vector<int>) int
point.x * point.y
fun example2(point Vector<f32>) f32
point.x * point.y
Template parameters can define default values:
type Foo<T,Offs=uint>
value T
offs Offs
var a Foo<int> // is really Foo<int,uint>
Templates can be partially expanded:
type Foo<T,Offs=uint>
value T
offs Offs
type Bar<T>
small_foo Foo<T,u8> // Foo partially expanded
var a Bar<int> // Bar and Foo fully expanded
Currently only types can be templates, not yet functions, but I'll add that.
It is also not yet possible to define type functions for template types
(i.e. fun Foo<T>.bar(this)
does not compile.)
A package in Compis is a directory of files and identified by its file path. Declarations in any file is visible in other files of the same package. For example:
$ cat foo/a.co
fun a(x int) int
x * 2
$ cat foo/b.co
fun b(x int) int
a(x) // the "a" function is visible here
$ compis build ./foo
[foo 6/6] build/opt-x86_64-linux/bin/foo
The only exception to this rule is imported symbols, which are only visible within the same source file:
$ cat foo/a.co
import "std/runtime" { print }
// here, "print" is only available in this source file
fun say_hello()
print("Hello")
$ cat foo/b.co
pub fun main()
say_hello() // ok; say_hello is available
print("Hello") // error; print is not available here
$ compis build ./foo
foo/b.co:3:3: error: unknown identifier "print"
3 → │ print("Hello") // error; print is not available here
│ ~~~~~
As a convenience, Compis can build "ad-hoc packages" out of arbitrary source files:
$ compis build somedir/myprog.co
[somedir/myprog 4/4] build/opt-x86_64-macos.10/bin/myprog
import "foo/bar" // package available as "bar"
import "foo/bar" as lol // package available as "lol"
import "./mysubpackage" // relative to source file's directory
import "cat" { x, y } // import only members x and y
import "cat" { x, y as y2 } // change name of some members
import "cat" as c { x, y } // import both package and some members
import "html" { * } // import all members
import "html" { *, a as A } // import all members, change name of some
List of members is a block which contains lists (comma separated names), which allows us to write it in different ways depending on the amount of members we are importing. This can help readability and improved version control diffs. The following three are equivalent:
import "fruit" { apple, banana, citrus, peach as not_a_plum }
import "fruit"
apple
banana
citrus
peach as not_a_plum
import "fruit" { apple,
banana, citrus,
peach as not_a_plum
}
Similarly—as a convenience—imports themselves can be grouped in a block:
import
"foo/bar"
"foo/bar" as lol
"./mysubpackage"
"cat" { x, y as y2 }
"html" { * }
"fruit"
apple
banana
citrus
peach as not_a_plum
// equivalent to:
import "foo/bar"
import "foo/bar" as lol
import "./mysubpackage"
import "cat" { x, y as y2 }
import "html" { * }
import "fruit" { apple, banana, citrus, peach as not_a_plum }
When a package's namespace is imported and no explicit name is given with as name
,
the package's identifier is inferred from its last path component:
import "foo/abc" // imported as identifier "abc"
import "foo/a b c" // imported as identifier "c"
import "foo/a-b-c" // imported as identifier "c"
If the last path component is not a valid identifier as name
is required:
import "foo/abc!" // error: cannot infer package identifier
import "foo/abc!" as abc // imported as identifier "abc"
To import a package only for its side effects, without introducing any identifiers
into the source file scope, name it _
:
"test/check-that-random-works" as _
Examples of invalid import paths:
import "" // error: invalid import path (empty path)
import "/" // error: invalid import path (absolute path)
import "." // error: invalid import path (cannot import itself)
import " " // error: invalid import path (leading whitespace)
import "a:" // error: invalid import path (invalid character)
Imports syntax specification:
importstmt = "import" (importgroup | importspec) ";"
importgroup = "{" (importspec ";")+ "}"
importspec = posixpath ("as" id)? membergroup?
membergroup = "{" (memberlist ";")+ "}"
memberlist = member ("," member)*
member = "*" | id ("as" id)?
Compis has the following primitive types:
TYPE VALUES
void nothing (can only be used as function-result type)
bool false or true
i8 -128 … 127
i16 -32768 … 32767
i32 -2147483648 … 2147483647
i64 -9223372036854775808 … 9223372036854775807
u8 0 … 255
u16 0 … 65535
u32 0 … 4294967295
u64 0 … 18446744073709551615
int Target-specific address size (i.e. i8, i16, i32 or i64)
uint Target-specific address size (i.e. u8, u16, u32 or u64)
f32 -16777216 … 16777216 (integer precision)
f64 −9007199254740992 … 9007199254740992 (-2e53 … 2e53; integer precision)
Integers are represented as two's complement. Floating-point numbers are represented as IEEE 754. Overflow of signed integers causes a runtime panic. Overflow of unsigned integers wrap.
fun inferred_int_types()
let _ = 1_23_4 // int (32-bit target) int (64-bit target)
let _ = -1 // int (32-bit target) int (64-bit target)
let _ = 2147483647 // int (32-bit target) int (64-bit target)
let _ = -2147483648 // int (32-bit target) int (64-bit target)
let _ = 4294967295 // uint (32-bit target) int (64-bit target)
let _ = -2147483649 // i64 (32-bit target) int (64-bit target)
let _ = 4294967296 // u64 (32-bit target) uint (64-bit target)
fun integer_limits()
let _ i8 = -0x80
let _ u8 = 255
let _ i16 = -32768
let _ u16 = 65535
let _ i32 = -2147483648
let _ u32 = 4_294_967_295
let _ i64 = -9223372036854775808
let _ u64 = 0xffff_ffff_ffff_ffff
Rather than literals being of a fixed type, Compis features "ideally typed" integer constants; the actual type is inferred from context and its value is verified to be within range:
fun foo(v i32) void
fun bar(v u16) void
fun cat(v i8) void
fun example()
foo(200) // materialized as i32
bar(200) // materialized as u16
//cat(200) // error: integer constant overflows i8
There's no special type for representing text runes in Compis.
Instead you deal with either u32
for Unicode codepoints,
or u8
for UTF-8 encoded sequences.
However Compis has syntax for character literals. Character literals must be valid Unicode codepoints.
fun example()
let _ = 'a' // 0x61
let _ = '\n' // 0x0A
let _ = '©' // 0xA9 (encoded as UTF-8 C2 A9)
let _ = '\u00A9' // 0xA9
//let _ = '\xA9' // error: invalid UTF-8 sequence
//let _ = '' // error: empty character literal
If you feel the need to specify character values which are not Unicode codepoints, use regular integers:
fun example()
let _ = 0xA9
let _ = 0xFF
let _ = 0xFFFFFFFF
Strings represent UTF-8 text. The type str
is an alias of &[u8]
(slice of bytes.)
String literals are high-fidelity types, embedding the size in the type (e.g. &[u8 5]
).
fun foo(s str)
fun bar(s &[u8])
fun cat(s &[u8 5])
fun mak(s &[u8 6])
fun example()
let s = "hello" // &[u8 5]
foo(s)
bar(s)
cat(s)
mak(s) // error: passing value of type &[u8 5] to parameter of type &[u8 6]
Build an optimized product:
./build.sh
Build a debug product:
./build.sh -debug
Run co:
out/debug/co build -o out/hello examples/hello.c examples/foo.co
out/hello
Build & run debug product in continuous mode:
./build.sh -debug -wf=examples/foo.co \
-run='out/debug/co build examples/hello.c examples/foo.co && build/debug/main'