cEP 23: Separation of bears' metadata

Closes coala#138
yukiisbored · May 10, 2018 · da6a447 · da6a447
1 parent c1a4b7e
commit da6a447
Showing 1 changed file with 378 additions and 0 deletions.
diff --git a/cEP-0023.md b/cEP-0023.md
@@ -0,0 +1,378 @@
+# Separation of bears' metadata
+
+| Metadata |                                               |
+|----------|-----------------------------------------------|
+| cEP      | 0023                                          |
+| Version  | 1.0                                           |
+| Title    | Separation of bears' metadata                 |
+| Authors  | Muhammad Kaisar Arkhan <mailto:[email protected]> |
+| Status   | Proposed                                      |
+| Type     | Feature                                       |
+
+## Abstract
+
+This cEP proposes a method of separating bears' metadata and separating the
+usage of Python when writing bears.
+
+## How bears are written currently
+
+Most bears are composed of Python boilerplate code containing the needed
+metadata by coala, some more metadata to identify what a bear is, and docstrings
+for the bear description.
+
+[GoVetBear][GoVetBear]
+
+Of course not all bears are just boilerplate code. Some require Python code to
+help coala execute the linters, parse logs, make configuration files, etc.
+
+[CoffeeLintBear][CoffeeLintBear]
+
+Some bears are made locally by the coala team.
+
+[SpaceConsistencyBear][SpaceConsistencyBear]
+
+## Problems with current way of writing bears
+
+### Duplicate code all over the place
+
+This makes it annoying when introducing a new feature that deprecates the old
+methods.
+
+When writing bears, You have to get the Python boilerplate and put fancy
+metadata.
+
+When a new feature that deprecates the old way of doing things, we have to
+change almost every bear code.
+
+[Example 1][Example 1]
+
+### Python is not needed
+
+Bears such as [GoVetBear][GoVetBear] don't need Python to declare metadata.
+
+The usage of `@linter` decorator helps supressing a lot of boilerplate code
+but it still have the issue of having to use Python to just declare metadata.
+
+Some projects/orgs may need to write their own bear so coala can use their
+exclusive tools (such as commerical code safety checks that are commonly used
+by embedded software projects).
+
+Not all projects/organization want snippets of Python code in their projects
+just to simply declare on how to use the linter and not everyone can write
+Python.
+
+### Development is slow
+
+This is specific to bears that are made in-house or require a lot of fancy
+code to run.
+
+When writing a bear, we have to test them.
+
+This require setting up coala development in your environment, making sure
+coala-bears isn't installed or declare the bears directory which may result
+in a conflict, run coala with a long list of arguments or just make a
+`.coafile`.
+
+or do the other way around, write the tests first and just run `py.test` to
+test your fresh new bear.
+
+Either way, both of them add a lot of time to just test a bear when
+development. You don't need to write a lot of unneccesary boilerplate code to
+just run the bear ad-hoc. It should be a simple as running them in your
+shell.
+
+### Dual functionality of bears
+
+Are bears linters or are they just metadata to instruct coala to run linters?
+
+Should bears just declare metadata and have the code that make it coala-able
+separated?
+
+This has been an issue for a while and it generates inconsistencies all over
+the place.
+
+Some bears have needy code to generate configuration files such as
+[CoffeeLintBear][CoffeeLintBear].
+
+Some bears just put their code into themselves such as
+[SpaceConsistencyBear][SpaceConsistencyBear].
+
+Some of the Python bears just call the functions such as
+[PEP8Bear][PEP8Bear].
+
+I believe bears should be simply metadata while the actual linter tool should
+be seperated from them.
+
+Needy code such as generating config files can easily be tasked into an
+external script.
+
+### Dependency Hell
+
+Tracking coala and coala-bears has been a problem. coala and coala-bears must
+be released together and releases are quite slow because coala need a lot of
+changes while bears should be able to be released soon.
+
+This holds back a lot of new bears and bug fixes.
+
+coala-bears should have a steady and often release cycle so people can enjoy
+bug fixes and new bears without coala development holding them back.
+
+Sadly this is a hard thing to do because coala-bears is a bunch of Python
+code that are calling things from coala that may or may not be there.
+
+This creates a dependency cycle from both coala and coala-bears that should
+not be ignored.
+
+### Security
+
+When declaring bears code inside the context of the coala process, it is
+possible to intorduce bugs that have access to the coala process.
+
+This is bad since it is possible to leak information and possible gain code
+execution which makes it possible in theory for services such as continuous
+integration or have a specific usage of coala to be exploited and leak
+information such as secret keys for deployment like the Play Store.
+
+coala should simply run linters in a seperated manner. It should not run
+them inside the same context.
+
+If we treat bears as simply just metadata, it will help implementation of
+good secure practices such as privilege separation, operating system
+specific mitigations, and many more possible and way easier.
+
+## Objective
+
+coala-bears can be simplified by order of magnitude if it was treated as a
+repository filled with metadata to instruct coala on how to use linters.
+coala-bears should operate independently of coala development enabling a faster
+release cycle and deliver bug fixes and new bears faster.
+
+## Structure of Bears
+
+Collection of bears will be put inside a directory that are declared in
+`$COALA_BEAR_PATH` with defaults such as
+`$HOME/.coala/bears:/usr/local/lib/coala/bears:/usr/lib/coala/bears` in addition
+to a possible local `.coala` directory inside the project where bears are
+located inside `.coala/bears`.
+
+```
+ /usr/local/lib/coala/bears
+...
+ |
+ |_ GoVetBear
+ |  |_ metadata.toml
+ |
+ |_ CoffeeLintBear
+ |  |_ metadata.toml
+ |  |_ bear.py
+ |  |_ generate_config.py
+ |
+ |_ SpaceConsistencyBear
+ |  |_ metadata.toml
+ |  |_ bear.py
+ |
+ |_ PEP8Bear
+ |  |_ metadata.toml
+ |  |_ bear.py
+...
+
+ .coala/bears
+ |_ AeroplaneSafetyComplianceBear
+ |  |_ metadata.toml
+ |
+ |_ MemoryStructureFormatBear
+    |_ metadata.toml
+    |_ check_memory_structure.sh
+```
+
+The `metadata.toml` file will declare the metadata required to instruct coala on
+how to use the tool, what arguments to give when executing, what dependencies
+required, etc.
+
+Inside the folder, a script or an executable can be added seperating the need of
+coala when executing thus removing the dependency cycle.
+
+The script will be launched as a general fork+exec model to prevent the script
+from doing malicious things inside the context of coala.
+
+Enabling coala itself to do more safety features such as implementing operating
+system specific safety features (FreeBSD Capscicum, OpenBSD pledge, Linux
+SECCOMP, etc) and have a more fine-grained priviledge separation, however those
+aren't part of this cEP and will be covered in another time.
+
+## `metadata.toml`
+
+`metadata.toml` is essentially a TOML file declaring the needed information for
+coala.
+
+TOML is chosen since it has enough features to do what we want. We may need to
+research on ini files are good enough since those are already inside Python's
+standard library.
+
+Here are a couple of examples:
+
+**GoVetBear/metadata.toml**
+```toml
+[identity]
+name = "GoVetBear"
+description = """\
+              Analyze Go code and raise suspicious constructs, such as printf calls \
+              whose arguments do not correctly match the format string, useless \
+              assignments, common mistakes about boolean operations, unreachable code, \
+              etc.\
+              """
+languages = ["Go"]
+authors = ["The coala developers"]
+authors_email = ["[email protected]"]
+license = "AGPL-3.0"
+can_detect = ["Unused code", "Smell", "Unreachable Code"]
+
+[[requirements]]
+type = "AnyOneOf"
+
+    [[requirements.child]]
+    type = "binary"
+    name = "go"
+
+    [[requirements.child]]
+    type = "apt"
+    name = "golang"
+
+[[requirements]]
+type = "GoRequirement"
+package = "golang.org/cmd/vet"
+flag = "-u"
+
+[run]
+executable = "go"
+arguments = "vet"
+use_stdout = false
+use_stderr = true
+output_format = "regex"
+output_regex = ".+:(?P<line>\d+): (?P<message>.*)"
+```
+
+**SpaceConsistencyBear/metadata.toml**
+```toml
+[identity]
+name = "SpaceConsistencyBear"
+description = """\
+              Check and correct spacing for all textual data. This includes usage of \
+              tabs vs. spaces, trailing whitespace and (missing) newlines before \
+              the end of the file.\
+              """
+languages = ["All"]
+authors = ["The coala developers"]
+authors_email = ["[email protected]"]
+license = "AGPL-3.0"
+can_detect = ["Formatting"]
+
+[[params]]
+name = "use_spaces"
+description = "True if spaces are to be used instead of tabs."
+type = "bool"
+
+[[params]]
+name = "allow_trailing_whitespace"
+description = "Whether to allow trailing whitespace or not."
+type = "bool"
+default = false
+
+[[params]]
+name = "indent_size"
+description = "Number of spaces per indentation level"
+type = "int"
+default = 8
+
+[[params]]
+name = "enforce_newline_at_EOF"
+description = "Whether to enforce a newline at the end of file"
+type = "bool"
+default = true
+format="enforce-newline={}"
+
+[run]
+executable = "bear.py"
+local = true
+use_coala_logging_style = true
+```
+
+As you can see from SpaceConsistencyBear example, It is treated not as a Python
+code running under coala but rather if it was it's own linter. The `local`
+variable is simply to indicate the file is inside the directory and not in
+`$PATH` and `use_coala_logging_style` variable to tell coala that it's going to
+use the common log format.
+
+Parameters will be given to the process via command arguments when launching.
+With the defaults of the above example it will result in the following command
+to execute:
+
+```sh
+/usr/local/lib/coala/bears/general/SpaceConsistencyBear/bear.py \
+	--allow_trailing_whitespace=false \
+	--indent_size=8 \
+	enforce-newline=true
+```
+
+The above example is formatted for reading, the real command will be in one
+line.
+
+**CoffeeLintBear/metadata.toml**
+```toml
+[identity]
+name = "CoffeeLintBear"
+description = "Check CoffeeScript for a clean and consistent file"
+url = "http://www.coffeelint.org"
+languages = ["CoffeeScript"]
+authors = ["The coala developers"]
+authors_email = ["[email protected]"]
+license = "AGPL-3.0"
+can_detect = ["Syntax", "Formatting", "Smell", "Complexity", "Duplication"]
+
+[severity_map]
+normal = "warn"
+major = "error"
+info = "ignore"
+
+[[requirements]]
+type = "binary"
+name = "coffeelint"
+
+[[params]]
+name = "max_line_length"
+description = "Maximum number of characters per line."
+type = "int"
+default = 79
+
+...
+
+[prerun]
+executable = "generate_config.py"
+local = true
+use_coala_logging_style = true
+
+[run]
+executable = "bear.py"
+ignore_params = true
+local = true
+use_coala_logging_style = true
+```
+
+CoffeeLintBear example above indicates how the metadata will look like if it
+requires special treatment such as generating configuration files and
+translating the output of the linter.
+
+If it require some special treatment after the linter is executed, a `postrun`
+section can be added as well.
+
+`prerun` and `postrun` section will have the same format as the `run` section.
+
+## Process
+
+TODO
+
+[GoVetBear]: https://github.com/coala/coala-bears/blob/3cb9b148adc0dda51ac890188b38fd968f6058fd/bears/go/GoVetBear.py
+[CoffeeLintBear]: https://github.com/coala/coala-bears/blob/3cb9b148adc0dda51ac890188b38fd968f6058fd/bears/coffee_script/CoffeeLintBear.py
+[SpaceConsistencyBear]: https://github.com/coala/coala-bears/blob/3cb9b148adc0dda51ac890188b38fd968f6058fd/bears/general/SpaceConsistencyBear.py
+[PEP8Bear]: https://github.com/coala/coala-bears/blob/c5a5e201a42c44c159b9c118b062417e4ae4b17f/bears/python/PEP8Bear.py
+[Example 1]: https://github.com/coala/coala-bears/commit/3cb9b148adc0dda51ac890188b38fd968f6058fd