Skip to content

Commit

Permalink
cEP 23: Separation of bears' metadata
Browse files Browse the repository at this point in the history
Closes coala#138
  • Loading branch information
yukiisbored committed May 10, 2018
1 parent c1a4b7e commit da6a447
Showing 1 changed file with 378 additions and 0 deletions.
378 changes: 378 additions & 0 deletions cEP-0023.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
# Separation of bears' metadata

| Metadata | |
|----------|-----------------------------------------------|
| cEP | 0023 |
| Version | 1.0 |
| Title | Separation of bears' metadata |
| Authors | Muhammad Kaisar Arkhan <mailto:[email protected]> |
| Status | Proposed |
| Type | Feature |

## Abstract

This cEP proposes a method of separating bears' metadata and separating the
usage of Python when writing bears.

## How bears are written currently

Most bears are composed of Python boilerplate code containing the needed
metadata by coala, some more metadata to identify what a bear is, and docstrings
for the bear description.

[GoVetBear][GoVetBear]

Of course not all bears are just boilerplate code. Some require Python code to
help coala execute the linters, parse logs, make configuration files, etc.

[CoffeeLintBear][CoffeeLintBear]

Some bears are made locally by the coala team.

[SpaceConsistencyBear][SpaceConsistencyBear]

## Problems with current way of writing bears

### Duplicate code all over the place

This makes it annoying when introducing a new feature that deprecates the old
methods.

When writing bears, You have to get the Python boilerplate and put fancy
metadata.

When a new feature that deprecates the old way of doing things, we have to
change almost every bear code.

[Example 1][Example 1]

### Python is not needed

Bears such as [GoVetBear][GoVetBear] don't need Python to declare metadata.

The usage of `@linter` decorator helps supressing a lot of boilerplate code
but it still have the issue of having to use Python to just declare metadata.

Some projects/orgs may need to write their own bear so coala can use their
exclusive tools (such as commerical code safety checks that are commonly used
by embedded software projects).

Not all projects/organization want snippets of Python code in their projects
just to simply declare on how to use the linter and not everyone can write
Python.

### Development is slow

This is specific to bears that are made in-house or require a lot of fancy
code to run.

When writing a bear, we have to test them.

This require setting up coala development in your environment, making sure
coala-bears isn't installed or declare the bears directory which may result
in a conflict, run coala with a long list of arguments or just make a
`.coafile`.

or do the other way around, write the tests first and just run `py.test` to
test your fresh new bear.

Either way, both of them add a lot of time to just test a bear when
development. You don't need to write a lot of unneccesary boilerplate code to
just run the bear ad-hoc. It should be a simple as running them in your
shell.

### Dual functionality of bears

Are bears linters or are they just metadata to instruct coala to run linters?

Should bears just declare metadata and have the code that make it coala-able
separated?

This has been an issue for a while and it generates inconsistencies all over
the place.

Some bears have needy code to generate configuration files such as
[CoffeeLintBear][CoffeeLintBear].

Some bears just put their code into themselves such as
[SpaceConsistencyBear][SpaceConsistencyBear].

Some of the Python bears just call the functions such as
[PEP8Bear][PEP8Bear].

I believe bears should be simply metadata while the actual linter tool should
be seperated from them.

Needy code such as generating config files can easily be tasked into an
external script.

### Dependency Hell

Tracking coala and coala-bears has been a problem. coala and coala-bears must
be released together and releases are quite slow because coala need a lot of
changes while bears should be able to be released soon.

This holds back a lot of new bears and bug fixes.

coala-bears should have a steady and often release cycle so people can enjoy
bug fixes and new bears without coala development holding them back.

Sadly this is a hard thing to do because coala-bears is a bunch of Python
code that are calling things from coala that may or may not be there.

This creates a dependency cycle from both coala and coala-bears that should
not be ignored.

### Security

When declaring bears code inside the context of the coala process, it is
possible to intorduce bugs that have access to the coala process.

This is bad since it is possible to leak information and possible gain code
execution which makes it possible in theory for services such as continuous
integration or have a specific usage of coala to be exploited and leak
information such as secret keys for deployment like the Play Store.

coala should simply run linters in a seperated manner. It should not run
them inside the same context.

If we treat bears as simply just metadata, it will help implementation of
good secure practices such as privilege separation, operating system
specific mitigations, and many more possible and way easier.

## Objective

coala-bears can be simplified by order of magnitude if it was treated as a
repository filled with metadata to instruct coala on how to use linters.
coala-bears should operate independently of coala development enabling a faster
release cycle and deliver bug fixes and new bears faster.

## Structure of Bears

Collection of bears will be put inside a directory that are declared in
`$COALA_BEAR_PATH` with defaults such as
`$HOME/.coala/bears:/usr/local/lib/coala/bears:/usr/lib/coala/bears` in addition
to a possible local `.coala` directory inside the project where bears are
located inside `.coala/bears`.

```
/usr/local/lib/coala/bears
...
|
|_ GoVetBear
| |_ metadata.toml
|
|_ CoffeeLintBear
| |_ metadata.toml
| |_ bear.py
| |_ generate_config.py
|
|_ SpaceConsistencyBear
| |_ metadata.toml
| |_ bear.py
|
|_ PEP8Bear
| |_ metadata.toml
| |_ bear.py
...
.coala/bears
|_ AeroplaneSafetyComplianceBear
| |_ metadata.toml
|
|_ MemoryStructureFormatBear
|_ metadata.toml
|_ check_memory_structure.sh
```

The `metadata.toml` file will declare the metadata required to instruct coala on
how to use the tool, what arguments to give when executing, what dependencies
required, etc.

Inside the folder, a script or an executable can be added seperating the need of
coala when executing thus removing the dependency cycle.

The script will be launched as a general fork+exec model to prevent the script
from doing malicious things inside the context of coala.

Enabling coala itself to do more safety features such as implementing operating
system specific safety features (FreeBSD Capscicum, OpenBSD pledge, Linux
SECCOMP, etc) and have a more fine-grained priviledge separation, however those
aren't part of this cEP and will be covered in another time.

## `metadata.toml`

`metadata.toml` is essentially a TOML file declaring the needed information for
coala.

TOML is chosen since it has enough features to do what we want. We may need to
research on ini files are good enough since those are already inside Python's
standard library.

Here are a couple of examples:

**GoVetBear/metadata.toml**
```toml
[identity]
name = "GoVetBear"
description = """\
Analyze Go code and raise suspicious constructs, such as printf calls \
whose arguments do not correctly match the format string, useless \
assignments, common mistakes about boolean operations, unreachable code, \
etc.\
"""
languages = ["Go"]
authors = ["The coala developers"]
authors_email = ["[email protected]"]
license = "AGPL-3.0"
can_detect = ["Unused code", "Smell", "Unreachable Code"]

[[requirements]]
type = "AnyOneOf"

[[requirements.child]]
type = "binary"
name = "go"

[[requirements.child]]
type = "apt"
name = "golang"

[[requirements]]
type = "GoRequirement"
package = "golang.org/cmd/vet"
flag = "-u"

[run]
executable = "go"
arguments = "vet"
use_stdout = false
use_stderr = true
output_format = "regex"
output_regex = ".+:(?P<line>\d+): (?P<message>.*)"
```

**SpaceConsistencyBear/metadata.toml**
```toml
[identity]
name = "SpaceConsistencyBear"
description = """\
Check and correct spacing for all textual data. This includes usage of \
tabs vs. spaces, trailing whitespace and (missing) newlines before \
the end of the file.\
"""
languages = ["All"]
authors = ["The coala developers"]
authors_email = ["[email protected]"]
license = "AGPL-3.0"
can_detect = ["Formatting"]

[[params]]
name = "use_spaces"
description = "True if spaces are to be used instead of tabs."
type = "bool"

[[params]]
name = "allow_trailing_whitespace"
description = "Whether to allow trailing whitespace or not."
type = "bool"
default = false

[[params]]
name = "indent_size"
description = "Number of spaces per indentation level"
type = "int"
default = 8

[[params]]
name = "enforce_newline_at_EOF"
description = "Whether to enforce a newline at the end of file"
type = "bool"
default = true
format="enforce-newline={}"

[run]
executable = "bear.py"
local = true
use_coala_logging_style = true
```

As you can see from SpaceConsistencyBear example, It is treated not as a Python
code running under coala but rather if it was it's own linter. The `local`
variable is simply to indicate the file is inside the directory and not in
`$PATH` and `use_coala_logging_style` variable to tell coala that it's going to
use the common log format.

Parameters will be given to the process via command arguments when launching.
With the defaults of the above example it will result in the following command
to execute:

```sh
/usr/local/lib/coala/bears/general/SpaceConsistencyBear/bear.py \
--allow_trailing_whitespace=false \
--indent_size=8 \
enforce-newline=true
```

The above example is formatted for reading, the real command will be in one
line.

**CoffeeLintBear/metadata.toml**
```toml
[identity]
name = "CoffeeLintBear"
description = "Check CoffeeScript for a clean and consistent file"
url = "http://www.coffeelint.org"
languages = ["CoffeeScript"]
authors = ["The coala developers"]
authors_email = ["[email protected]"]
license = "AGPL-3.0"
can_detect = ["Syntax", "Formatting", "Smell", "Complexity", "Duplication"]

[severity_map]
normal = "warn"
major = "error"
info = "ignore"

[[requirements]]
type = "binary"
name = "coffeelint"

[[params]]
name = "max_line_length"
description = "Maximum number of characters per line."
type = "int"
default = 79

...

[prerun]
executable = "generate_config.py"
local = true
use_coala_logging_style = true

[run]
executable = "bear.py"
ignore_params = true
local = true
use_coala_logging_style = true
```

CoffeeLintBear example above indicates how the metadata will look like if it
requires special treatment such as generating configuration files and
translating the output of the linter.

If it require some special treatment after the linter is executed, a `postrun`
section can be added as well.

`prerun` and `postrun` section will have the same format as the `run` section.

## Process

TODO

[GoVetBear]: https://github.com/coala/coala-bears/blob/3cb9b148adc0dda51ac890188b38fd968f6058fd/bears/go/GoVetBear.py
[CoffeeLintBear]: https://github.com/coala/coala-bears/blob/3cb9b148adc0dda51ac890188b38fd968f6058fd/bears/coffee_script/CoffeeLintBear.py
[SpaceConsistencyBear]: https://github.com/coala/coala-bears/blob/3cb9b148adc0dda51ac890188b38fd968f6058fd/bears/general/SpaceConsistencyBear.py
[PEP8Bear]: https://github.com/coala/coala-bears/blob/c5a5e201a42c44c159b9c118b062417e4ae4b17f/bears/python/PEP8Bear.py
[Example 1]: https://github.com/coala/coala-bears/commit/3cb9b148adc0dda51ac890188b38fd968f6058fd

0 comments on commit da6a447

Please sign in to comment.