From c58c073597cadebbfd051d36397416ce11333d27 Mon Sep 17 00:00:00 2001 From: Pranav Gaikwad Date: Thu, 2 Nov 2023 16:03:37 -0400 Subject: [PATCH] :books: update docs Signed-off-by: Pranav Gaikwad --- docs/README.md | 2 +- docs/development.md | 1 + docs/output.md | 65 +++++++++ docs/rules.md | 332 ++++++++++++++++++++++++-------------------- docs/violations.md | 66 --------- 5 files changed, 249 insertions(+), 217 deletions(-) create mode 100644 docs/development.md create mode 100644 docs/output.md delete mode 100644 docs/violations.md diff --git a/docs/README.md b/docs/README.md index c35145aa..60f71fd5 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,5 +2,5 @@ * [Providers](./providers.md) * [Rules](./rules.md) -* [Violations](./violations.md) +* [Output](./output.md) * [Rule Labels](./labels.md) \ No newline at end of file diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 00000000..d66f3cb1 --- /dev/null +++ b/docs/development.md @@ -0,0 +1 @@ +# Development Guide \ No newline at end of file diff --git a/docs/output.md b/docs/output.md new file mode 100644 index 00000000..9ca9ccdb --- /dev/null +++ b/docs/output.md @@ -0,0 +1,65 @@ +# Analysis Output + +The analyzer engine generates output of the analysis in a YAML file specified by `--output-file` option in the CLI. + +## Output Structure + +The engine takes one or more _Rules_ or _Rulesets_ as input via the `--rules` option. See [passing rules as input](./rules.md#passing-rules-as-input) for more information. + +The YAML output of analysis contains a list with each item in the list being a [_Ruleset_](https://github.com/konveyor/analyzer-lsp/blob/0008c1e70ae770d9ca7f73a5b723ce0fa7688b69/output/v1/konveyor/violations.go#L14-L33) type. Each of these rulesets in the output corresponds to its respective input ruleset: + +```yaml +- name: ruleset-1 (1) + description: | (2) + Text description about ruleset 1 + tags: (3) + - tag1 + violations: (4) + rule-1: + + errors: (5) + rule-2: "failed to evaluate" + unmatched: (6) + - rule-2 + skipped: (7) + - rule-3 +``` + +1. **name**: Name of the input ruleset for which output is generated +2. **description**: Description of the ruleset copied from input ruleset for identification +3. **tags**: Tags generated by all the matched "Tagging" rules in the ruleset. (See [Tag Action](./rules.md#tag-action)) +4. **violations**: A map containing a [Violation](https://github.com/konveyor/analyzer-lsp/blob/0008c1e70ae770d9ca7f73a5b723ce0fa7688b69/output/v1/konveyor/violations.go#L52-L74) type for every matched rule in the ruleset. (Keys are Rule IDs and values are their respective _Violations_) +5. **errors**: A map containing error strings for rules that the engine failed to evaluate. (Keys are Rule IDs and values are error strings indicating evaluation error) +6. **unmatched**: A list of Rule IDs in the ruleset that were evaluated but not matched. +7. **skipped**: A list of Rule IDs in the ruleset that were skipped because they didn't match the input label selector. (See [Label Selector](./labels.md#rule-label-selector)) + + +### Violations + +For every rule that is matched, the analyzer engine creates a _Violation_ in the output. + +* **description**: Text description about the match copied as-is from the rule. (See [Rule Metadata](./rules.md#rule-metadata)) + +* **category**: Pre-defined category string that indicates impact / severity of the problem. It is copied as-is from the rule. (See [Rule Categories](./rules.md#rule-categories)) + +* **labels**: A list of string labels copied as-is from the rule. (See [Rule Metadata](./rules.md#rule-metadata)) + +* **links**: A list of hyperlinks provided copied as-is from the rule. (See [Rule Links](./rules.md#links)) + * Each item in the list is a struct with following fields: + * **url**: URL string. + * **title**: Title string. + +* **incidents**: A list of [_Incident_](https://github.com/konveyor/analyzer-lsp/blob/0008c1e70ae770d9ca7f73a5b723ce0fa7688b69/output/v1/konveyor/violations.go#L77-L87) type indicating a match of the rule in the source code. + * There can be multiple matches of a rule. Each such incident has following fields: + * **uri**: File uri in the source code where the rule was matched. + * **lineNumber**: The line number in the file where match was found. + * **message**: A message copied as-is from the rule. (See [Message Action](./rules.md#message-action)) + * **codeSnip**: Relevant lines from the source code where the rule was matched. + * **variables**: A map containing values of matched _CustomVariables_ in the rule. (See [Custom Variables](./rules.md#custom-variables)) + +* **effort**: Integer indicating story points for each incident as determined by the rule author. (See [Rule Metadata](./rules.md#rule-metadata)) + +### User Interface for Analysis Output + +There is a standalone user interface available to visualize the YAML output in a static UI that runs in the browser. Check it out [here](https://github.com/konveyor/static-report). The [README](https://github.com/konveyor/static-report#readme) explains how it works with the YAML output. + diff --git a/docs/rules.md b/docs/rules.md index 48a3c1c7..85713d99 100644 --- a/docs/rules.md +++ b/docs/rules.md @@ -2,9 +2,23 @@ The analyzer rules are a set of instructions that are used to analyze source code and detect issues. Rules are fundamental pieces that codify modernization knowledge. -The analyzer parses user provided rules, evaluates them and generates _Violations_ for matched rules. A collection of one or more rules form a [_Ruleset_](#ruleset). _Rulesets_ provide an opionated way of organizing multiple rules that achieve a common goal. The analyzer CLI takes a set of rulesets as input arguments. +The analyzer parses user provided rules, evaluates them against input source code and generates _Violations_ for matched rules. A collection of one or more rules form a [Ruleset](#ruleset). _Rulesets_ provide an opionated way of organizing multiple rules that achieve a common goal. -## Rule +## Table of Contents + +1. [Rule Format](#rule) + 1. [Rule Metadata](#rule-metadata) + 2. [Rule Actions](#rule-actions) + 1. [Tag Action](#tag-action) + 2. [Message Action](#message-action) + 3. [Rule Conditions](#rule-conditions) + 1. [Provider Condition](#provider-condition) + 2. [And Condition](#and-condition) + 3. [Or Condition](#or-condition) +2. [Ruleset Format](#ruleset) +3. [Passing rules / rulesets as input](#passing-rules-as-input) + +## Rule A Rule is written in YAML. It consists of metadata, conditions and actions. It instructs analyzer to take specified actions when given conditions match. @@ -13,54 +27,37 @@ A Rule is written in YAML. It consists of metadata, conditions and actions. It i Rule metadata contains general information about a rule: ```yaml -# id must be unique among a Ruleset -ruleId: "unique_id" -# violations have pre-defined categories -category: "potential|information|mandatory" -# labels are key=value pairs attached to rules, value -# can be empty or omitted, keys can be subdomain prefixed -labels: - # key=value pair +ruleID: "unique_id" (1) +labels: (2) - "label1=val1" - # valid label with value omitted - - "label2" - # valid label with empty value - - "label3=" - # subdomain prefixed key - - "konveyor.io/label1=val1" -# effort is an integer value to indicate level of -# effort needed to fix this issue -effort: 1 +effort: 1 (3) +category: mandatory (4) ``` -See [labels doc](./labels.md) for more details on `labels` field. +1. **ruleID**: This is a unique ID for the rule. It must be unique within the ruleset. +2. **labels**: A list of string labels associated with the rule. (See [Labels](./labels.md)) +3. **effort**: Effort is an integer value that indicates the level of effort needed to fix this issue. +4. **category**: Category describes severity of the issue for migration. Values can be one of _mandatory_, _potential_ or _optional_. (See [Categories](#rule-categories)) -### Rule Actions +#### Rule Categories -A rule has `message` and `tag` actions. +* mandatory + * The issue must be resolved for a successful migration. If the changes are not made, the resulting application will not build or run successfully. Examples include replacement of proprietary APIs that are not supported in the target platform. +* optional + * If the issue is not resolved, the application should work, but the results may not be optimal. If the change is not made at the time of migration, it is recommended to put it on the schedule soon after your migration is completed. +* potential + * The issue should be examined during the migration process, but there is not enough detailed information to determine if the task is mandatory for the migration to succeed. -The `message` action generates a message for every violation created when rule matches. The message also supports templating in that the custom data exported by providers can be used in the message. -```yaml -# when a match is found, analyzer generates a violation with this message -message: "helpful message about the violation" -``` +### Rule Actions -Optionally, hyperlinks can be provided along with a message to give relevant information about the found issue: +A rule has two actions - `tag` and `message`. Either one or two of these actions can be defined on a rule. -```yaml -# links point to external hyperlinks -# rule authors are expected to provide -# relevant hyperlinks for quick fixes, docs etc -links: - - url: "konveyor.io" - title: "short title for the link" -``` +#### Tag Action -The `tag` action allows generating tags for the application. Each string in the tag can be a comma separated list of tags. Optionally, tags can have categories. +A tag action is used to create one or more tags for an application when the rule matches. It takes a list of string tags as its fields: ```yaml -# when a match is found, analyzer generates these tags for the application tag: # tags can be comma separated - "tag1,tag2,tag3" @@ -68,160 +65,176 @@ tag: - "Category=tag4,tag5" ``` -### Rule Conditions - -Finally, a rule contains a `when` block to specify a condition: - -```yaml -when: - - -``` +When a tag is a key=val pair, the keys are treated as category of that tag. For instance, `Backend=Java` is a valid tag with `Backend` being the category of tag `Java`. -`When` has exactly one condition. Multiple conditions can be nested within the top-level condition. +> Any rule that has a tag action in it is referred to as a "tagging rule". -#### Provider Conditions +#### Message Action -A "provider" knows how to analyse the source code of a technology. It publishes what it can do with the source code in terms of "capabilities". - -A provider condition instructs the analyzer to invoke a specific "provider" and use one of its "capabilities". In general, it is of the form `.`: +A message action is used to create an issue with the specified message when a rule matches: ```yaml -when: - .: - -``` - -Analyzer currently supports `builtin`, `java` and `go` providers. +# when a match is found, analyzer generates incidents each having this message +message: "helpful message about the violation" +``` -##### Builtin Provider +Message can also be templated to include information about the match interpolated via custom variables on the rule (See [Custom Variables](#custom-variables)): -`builtin` is an in-tree provider that can work with vaious different files and internal metadata generated by the engine. It has `file`, `filecontent`, `xml`, `json` and `hasTags` capabilities. +``` +- ruleID: lang-ref-004 + customVariables: + - pattern: '([A-z]+)\.get\(\)' + name: VariableName + message: "Found generic call - {{ VariableName }}" + when: + +``` -###### file +##### Links -`file` capability enables the provider to find files in the source code that match a given pattern: +Hyperlinks can be provided along with a `message` or `tag` action to provide relevant information about the found issue: ```yaml -when: - builtin.file: - pattern: "" +# links point to external hyperlinks +# rule authors are expected to provide +# relevant hyperlinks for quick fixes, docs etc +links: + - url: "konveyor.io" + title: "short title for the link" ``` -###### filecontent +### Rule Conditions -`filecontent` capability enables the provider to search for content that matches a given pattern: +Every rule has a `when` block that contains exactly one condition. A condition defines a search query to be evaluated against the input source code. ```yaml when: - builtin.filecontent: - filePattern: "" - pattern: "" + ``` -###### xml +There are three types of conditions - _and_, _or_ and _provider_. While the _provider_ condition is responsible for performing an actual search in the source code, the _and_ and _or_ conditions are logical constructs provided by the engine to form a complex condition from the results of multiple other conditions. -`xml` capability enables the provider to query XPath expressions on a list of provided XML files. Unlike providers discussed so far, `xml` takes two input parameters: +#### Provider Condition -```yaml -when: - builtin.xml: - # xpath must be a valid xpath expression - xpath: "" - # filepaths is a list of files to scope xpath query to - filepaths: - - "/src/file1.xml" - - "/src/file2.xml" -``` +The analyzer engine enables multi-language source code analysis via something called as “providers”. A "provider" knows how to analyse the source code of a technology. It publishes what it can do with the source code in terms of "capabilities". -###### json +A provider condition instructs the analyzer to invoke a specific "provider" and use one of its "capabilities". In general, it is of the form `.`: -`json` capability enables the provider to query XPath expressions on a list of provided JSON files. Unlike `xml`, currently `json` only takes xpath as input and performs the search on all json files in the codebase: +For instance, the `java` provider provides `referenced` capability. To search through Java source code, we can write a `java.referenced` condition: ```yaml when: - builtin.json: - # xpath must be a valid xpath expression - xpath: "" + java.referenced: + pattern: org.kubernetes.* + location: IMPORT ``` -#### hasTags +Note that depending on the provider, the fields of the condition (for instance, pattern and location above) will change. -`hasTags` enables the provider to query application tags. It doesn't deal with the source code, instead it queries the internal data structure to check whether given tags are present for the application: +Some providers have _dependency_ capability. It means that the provider can generate a list of dependencies for a given application. A dependency condition can be used to query this list and check whether a certain dependency (with a version range) exists for the application. For instance, to check if a Java application has a certain dependency, we can write a `java.dependency` condition: ```yaml when: - # when more than one tags are given, a logical AND is implied - hasTags: - - "tag1" - - "tag2" + java.dependency: + name: junit.junit + upperbound: 4.12.2 + lowerbound: 4.4.0 ``` -###### Java Provider - -Java provider can work with Java source code and provides capabilities `referenced` and `dependency`. +Analyzer currently supports `builtin`, `java`, `go` and `generic` providers. Here is the table that summarizes all the providers and their capabilities: -###### referenced +| Provider Name | Capabilities | Description | +| ------------- | ------------------------------------------------------------- | --------------------------------------------------------------------------------- | +| java | referenced | Find references of a pattern with an optional code location for detailed searches | +| | dependency | Check whether app has a given dependency | +| builtin | xml | Search XML files using xpath queries | +| | json | Search JSON files using jsonpath queries | +| | filecontent | Search content in regular files using regex patterns | +| | file | Find files with names matching a given pattern | +| | hasTags | Check whether a tag is created for the app via a tagging rule | +| go | referenced | Find references of a pattern | +| | dependency | Check whether app has a given dependency | -`referenced` capability enables the provider to find references in the source code. It takes two input parameters: +Based on the table above, we should be able to create the first part of the condition that doesn’t contain any of the condition fields. For instance, to create a `java` provider condition that uses `referenced` capability: ```yaml when: java.referenced: - # regex pattern to match - pattern: "" - # location defines the exact location where - # pattern should be matched - location: CONTRUCTOR_CALL + ``` -The supported locations are: -- CONSTRUCTOR_CALL -- TYPE -- INHERITANCE -- METHOD_CALL -- ANNOTATION -- IMPLEMENTS_TYPE -- ENUM_CONSTANT -- RETURN_TYPE -- IMPORT -- VARIABLE_DECLARATION +Depending on the _provider_ and the _capability_, there will be different `` in the condition. Following table summarizes available providers, their capabilities and all of their fields: + +| Provider | Capability | Fields | Required | Description | +|----------|-------------|------------|----------|---------------------------------------------------------------| +| java | referenced | pattern | Yes | Regex pattern | +| | | location | No | Source code location (See [Java Locations](#java-locations)) | +| | dependency | name | Yes | Name of the dependency | +| | | nameregex | No | Regex pattern to match the name | +| | | upperbound | No | Match versions lower than or equal to | +| | | lowerbound | No | Match versions greater than or equal to | +| builtin | xml | xpath | Yes | Xpath query | +| | | namespaces | No | A map to scope down query to namespaces | +| | | filepaths | No | Optional list of files to scope down search | +| | json | xpath | Yes | Xpath query | +| | | filepaths | No | Optional list of files to scope down search | +| | filecontent | pattern | Yes | Regex pattern to match in content | +| | | filePattern| No | Only search in files with names matching this pattern | +| | file | pattern | Yes | Find files with names matching this pattern | +| | hasTags | | | This is an inline list of string tags. See [Tag Action](#tag-action)| +| go | referenced | pattern | Yes | Regex pattern | +| | dependency | name | Yes | Name of the dependency | +| | | nameregex | No | Regex pattern to match the name | +| | | upperbound | No | Match versions lower than or equal to | +| | | lowerbound | No | Match versions greater than or equal to | + + +With the information above, we should be able to complete `java` condition we created earlier. We will search for references of a package: -###### Go Provider +```yaml +when: + java.referenced: + location: PACKAGE + pattern: org.jboss.* +``` -Go provider can work with Golang source code and provides capabilities `referenced` and `dependency`. +##### Java Locations -###### referenced +The java provider allows scoping the search down to certain source code locations. Any one of the following search locations can be used to scope down java searches: -`referenced` capability enables the provider to find references in the source code: +* CONSTRUCTOR_CALL +* TYPE +* INHERITANCE +* METHOD_CALL +* ANNOTATION +* IMPLEMENTS_TYPE +* ENUM_CONSTANT +* RETURN_TYPE +* IMPORT +* VARIABLE_DECLARATION -```yaml -when: - go.referenced: "" -``` -###### dependency +##### Custom Variables -The `dependency` capability enables the provider to find dependencies for an application: +Provider conditions can have associated "custom variables". Custom variables are used to capture relevant information from the matched line in the source code. The values of these variables will be interpolated with data matched in the source code. These values can be used to generate detailed templated messages in a rule’s action (See [Message action](#message-action)). They can be added to a rule in the `customVariables` field: ```yaml -when: - go.dependency: - # name of the dependency to search - name: "" - # upper bound on version of the depedency - upperbound: "" - # lower bound on version of the dependency - lowerbound: "" +- ruleID: lang-ref-004 + customVariables: + - pattern: '([A-z]+)\.get\(\)' (1) + name: VariableName (2) + message: "Found generic call - {{ VariableName }}" (3) + when: + java.referenced: + location: METHOD_CALL + pattern: com.example.apps.GenericClass.get ``` -A match is found when the given application has a dependency that falls within the given range of versions. +1. **pattern**: This is a regex pattern that will be matched on the source code line when a match is found. +2. **name**: This is the name of the variable that can be used in templates. +3. **message**: This is how to template a message using a custom variable. -#### Logical Conditions - -Analyzers provide two basic logical conditions that are useful in making more complex queries by aggregating results of other conditions. - -##### And Condition +#### And Condition The `And` condition takes an array of conditions and performs a logical "and" operation on their results: @@ -259,7 +272,7 @@ when: - go.referenced: "*CustomResourceDefinition*" ``` -##### Or Condition +#### Or Condition The `Or` condition takes an array of other conditions and performs a logical "or" operation on their results: @@ -274,19 +287,38 @@ when: A set of Rules form a Ruleset. Rulesets are an opionated way of passing Rules to Rules Engine. -Each Ruleset is stored in its own directory with a `ruleset.yaml` file at the directory root that stores metadata of the Ruleset. +A ruleset is created by placing one or more YAML rules files in a directory and creating a `ruleset.yaml` (golden file) file in it. + +The golden file stores metadata of the Ruleset. ```yaml -# name has to be unique within the provided rulesets -# doesn't necessarily has to be unique globally -name: "Name of the ruleset" -description: "Describes the ruleset" -# additional labels for ruleset -# labels help filter rulesets -labels: - - awesome_rules1 +name: my-ruleset (1) +description: Text description about ruleset (2) +labels: (3) +- key=val ``` -Labels on a Ruleset are inherited by all the rules in it. +1. **name**: A unique name for the ruleset. +2. **description**: Text description about the ruleset. +3. **labels**: A list of string labels for the ruleset. The labels on a ruleset are automatically inherted by all rules in the ruleset. (See Labels) + +## Passing rules as input + +The analyzer CLI provides `--rules` option to specify a YAML file containing rules or a ruleset directory: + +- It can be a file: + ```sh + konveyor-analyzer --rules rules-file.yaml ... + ``` + It is assumed that the file contains a list of YAML rules. The engine will automatically associate all rules in it with a default _Ruleset_. + +- It can be a directory: + ```sh + konveyor-analyzer --rules /ruleset/directory/ ... + ``` + It is assumed that the directory contains a _Ruleset_. (See [Ruleset](#ruleset)) -Rulesets provide a good way of organizing multiple rules that achieve a common goal. +- It can be given more than once with a mix of rules files and rulesets: + ```sh + konveyor-analyzer --rules /ruleset/directory/ --rules rules-file.yaml ... + ``` \ No newline at end of file diff --git a/docs/violations.md b/docs/violations.md deleted file mode 100644 index 324e237e..00000000 --- a/docs/violations.md +++ /dev/null @@ -1,66 +0,0 @@ -# Violations - -The analyzer rule engine creates a _Violation_ for every rule that matches except for rules that create tags. A _Violation_ provides details about a match. It also contains information inherited from the rule itself. - -### Violation Fields - -* **description**: Text description about the match. - -* **category**: Pre-defined category string that indicates impact / severity of the problem. It can be either one of following values: - * **mandatory**: Indicates a problem that rule author deemed as a blocker for modernization efforts and must be addressed. - * **optional**: Indicates a problem that rule author doesn't consider a blocker and can be fixed later. - * **potential**: Indicates a problem that rule author cannot certainly tell if it affects modernization efforts. - -* **labels**: A list of string labels inherited from rules. These are useful in filtering / identifying violations. - -* **links**: A list of hyperlinks provided by the rule author. Each item in the list is a struct with following fields: - * **url**: URL string. - * **title**: Title string. - -* **incidents**: A list of incident structs that each give detailed information about the match. - * **uri**: File uri in the source code where the rule was matched. - * **message**: A string that provides more information about the incident. - * **codeSnip**: Relevant lines from the source code where the rule was matched. - * **variables**: Raw structured information about the match generated by a provider. - -* **effort**: Integer value indicating story points for each incident as determined by the rule author. - -See [Violation struct](../output/v1/konveyor/violations.go) for more details. - -### Output Structure - -The overall output of an analysis run is organized by Rulesets. It is a list of Rulesets with each item containing a map of Violations that maps RuleID with its respective generated violation. See following example output generated when using 2 Rulesets as input to the analyzer: - -```yaml -- name: ruleset-00 - tags: - - License=Apache - violations: - chain-pom-001: - description: "" - category: potential - effort: 3 - incidents: - - uri: file:///analyzer-lsp/examples/customers-tomcat-legacy/pom.xml - message: "Found expected content in pom" - variables: - data: dependency - innerText: "\n\t\t\t\tcom.fasterxml.jackson\n\t\t\t\tjackson-bom\n\t\t\t\t${jackson.version}\n\t\t\t\timport\n\t\t\t\tpom\n\t\t\t" - matchingXML: com.fasterxml.jacksonjackson-bom${jackson.version}importpom -- name: ruleset-01 - tags: - - License=Apache - violations: - chain-pom-001: - description: "" - category: potential - effort: 3 - incidents: - - uri: file:///analyzer-lsp/examples/customers-tomcat-legacy/pom.xml - message: "Found expected content in pom" - variables: - data: dependency - innerText: "\n\t\t\t\tcom.fasterxml.jackson\n\t\t\t\tjackson-bom\n\t\t\t\t${jackson.version}\n\t\t\t\timport\n\t\t\t\tpom\n\t\t\t" - matchingXML: com.fasterxml.jacksonjackson-bom${jackson.version}importpom -``` -