Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add tag linter. #304

Merged
merged 9 commits into from
Feb 12, 2025
Merged

feat: Add tag linter. #304

merged 9 commits into from
Feb 12, 2025

Conversation

wxsBSD
Copy link
Contributor

@wxsBSD wxsBSD commented Feb 7, 2025

Add a tag linter that can be configured to use either a list of allowed strings, or a regexp. If both are provided in the config file the allowed list is preferred by the "check" command as it is more explicit.

As part of this work I realized we have no way to return multiple linter findings in a single check call, which is necessary when checking tags, so I modified the internal API to return a Option<Vec<LinterResult> where no linter findings is represented by returning None.

I also discovered that the way the compiler works is that any linter errors are collected but we only return the first one when calling add_source() which is not ideal when you can have multiple linter errors collected up. To deal with this I added a linter_errors: Vec<CompileError> to the compiler struct so that they can be checked for separately from actual compiler errors.

I also had to modify the output handling of the check command so that multiple linter errors are handled correctly.

Lastly, a couple of typos and copy/paste mistakes in documentation were fixed.

Add a tag linter that can be configured to use either a list of allowed strings,
or a regexp. If both are provided in the config file the allowed list is
preferred by the "check" command as it is more explicit.

As part of this work I realized we have no way to return multiple linter
findings in a single check call, which is necessary when checking tags, so I
modified the internal API to return a Option<Vec<LinterResult> where no linter
findings is represented by returning None.

I also discovered that the way the compiler works is that any linter errors are
collected but we only return the first one when calling "add_source()" which
is not ideal when you can have multiple linter errors collected up. To deal with
this I added a "linter_errors: Vec<CompileError>" to the compiler struct so that
they can be checked for separately from actual compiler errors.

I also had to modify the output handling of the "check" command so that multiple
linter errors are handled correctly.

Lastly, a couple of typos and copy/paste mistakes in documentation were fixed.
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello wxsBSD! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

I am currently reviewing this pull request. In the meantime, I'm providing a summary of the changes for you and other reviewers to quickly understand the intent and content. This pull request introduces a new tag linter to the YARA-X compiler. This linter can be configured to enforce that rule tags either belong to a list of allowed tags or match a given regular expression. The PR also modifies the compiler to collect multiple linter errors, instead of just the first one, and updates the check command to handle multiple linter errors correctly. Additionally, some typos and copy/paste errors in the documentation have been fixed.

Highlights

  • Tag Linter: Implements a new tag linter with support for allowed lists and regular expressions, configurable via the check.tags section in the config file.
  • Multiple Linter Errors: Modifies the compiler to collect and report multiple linter errors, improving the granularity of error reporting.
  • Error Handling: Updates the check command to correctly handle and display multiple linter errors.
  • Configuration: Adds TagConfig struct to cli/src/config.rs to configure the tag linter.
  • Compiler Modifications: Adds linter_errors field to the Compiler struct in lib/src/compiler/mod.rs to store linter errors separately from compiler errors.

Changelog

  • cli/src/commands/check.rs
    • Adds logic to use the tag linter based on configuration.
    • Modifies the output to display multiple linter errors if present.
    • Prioritizes the allowed list over the regex for tag validation.
  • cli/src/config.rs
    • Introduces TagConfig struct for configuring the tag linter, including allowed list, regexp, and error flag.
  • lib/src/compiler/errors.rs
    • Adds InvalidTag and UnknownTag error types to CompileError enum.
    • Updates error codes for InvalidRuleName to E039 and adds E040 and E041 for the new tag errors.
  • lib/src/compiler/linters.rs
    • Introduces the Tags struct and implements the LinterInternal trait for it.
    • Modifies LinterInternal::check to return Option<Vec<LinterResult>> to allow for multiple linter findings.
    • Updates RuleName and Metadata linters to return Option<Vec<LinterResult>>.
  • lib/src/compiler/mod.rs
    • Adds linter_errors field to the Compiler struct to store linter errors separately.
    • Modifies Compiler::add_source to collect linter errors and not return immediately on the first error.
    • Adds Compiler::linter_errors method to retrieve all linter errors.
    • Updates the linter execution logic to handle the new Option<Vec<LinterResult>> return type.
  • lib/src/compiler/tests/mod.rs
    • Updates tests to check for linter errors instead of compiler errors when the error flag is set to true.
  • lib/src/compiler/warnings.rs
    • Adds InvalidTag and UnknownTag warning types to Warning enum.
  • site/content/docs/cli/config.md
    • Adds documentation for the check.tags configuration options.

Using Gemini Code Assist

The full user's guide for Gemini Code Assist for GitHub can be found on our documentation page, though is a condensed version for your convenience.

Features

Gemini Code Assist is designed to act as a human reviewer on pull requests to provide instant, high quality code reviews. When a pull request is first opened (including as a draft), Gemini Code Assist will first provide a high level summary of the pull request to inform reviewers to help them quickly get up to speed. Next, Gemini Code Assist will offer an in depth code review, providing feedback throughout pull request, and offering directly committable code suggestion where useful. Users can engage with Gemini in the comments, with Gemini using it's deep knowledge to engage with the author and reviewers.

Gemini Code Assist will not approve a pull request as that is best left to human reviewers. It's feedback is intended to help the author and reviewers focus on what still needs to be done.

Below is a summary of supported features and their commands to invoke them at any point via an issue comment here on the main pull request page. To issue a command, first tag Gemini Code Assist and then provide the command, e.g. @gemini-code-assist review`.

Feature Command Description
Code Review review Performs a code review for the current pull request in its current state.
Pull Request Summary summary Provides a summary of the current pull request in its current state.
Comment Responds in comments when explicitly tagged, both in issue comments and review comments.
Repository Walkthrough walkthrough Generates a code walkthrough of the underlying repository to best help you get up to speed.
Help help Displays a list of available commands.
Customizability

To enhance the Gemini Code Assist for GitHub experience, we allow repository maintainers the ability to customize functionality to fit their needs. This includes the ability to tailor the actions of Gemini Code Assist, as well as preferred style guides to take into account when providing feedback. As specified below, this is done by creating a .gemini/ folder in the base of the repository, and the version merged into the default branch will be referenced during the pull request.

Configuration

To configure Gemini Code Assist's actions, create a configuration file at .gemini/config.yaml. As an example, below is a config that only shows review comments that are of severity MEDIUM or higher.

code_review:
  disable: false
  comment_severity_threshold: MEDIUM

The full configuration file schema can be found on our documentation page.

Style guide

By default, Gemini Code Assist references industry standard language style guides for its reviews, such as PEP-8 for Python. Gemini Code Assist will also try to infer any style guide that the repository may be implicitly using. In any case, when making suggestions, Gemini Code Assist will state the style guide referenced when applicable.

To set a custom style guide for your repository, you can create a .gemini/styleguide.md file at the base of the repository. The contents of this file are read as a plain text file, though feel free to use markdown syntax to form your style guide. Note that any links included in this style guide will not be read in by Gemini Code Assist at this time.

Code Review Criteria

Gemini Code Assist will review changes and provide feedback for pull requests across many different languages and file types. Gemini Code Assist is instructed to focus on (but not limit itself to) providing feedback on the following topics:

  1. Correctness: Ensure the code functions as intended and handles edge cases.
  2. Efficiency: Identify potential performance bottlenecks or areas for optimization.
  3. Maintability: Assess code readability, modularity, and adherence to language idioms and best practices.
  4. Security: Identify potential vulnerabilities in data handling or input validation.

For each review comment Gemini Code Assist leaves, it'll provide one of the following severity rating indicating how important the feedback is to address:

  • CRITICAL: This issue must be addressed immediately, as it could lead to serious consequences for the code's correctness, security, or performance.
  • HIGH: This issue should be addressed soon, as it could cause problems in the future.
  • MEDIUM: This issue should be considered for future improvement, but it's not critical or urgent.
  • LOW: This issue is minor or stylistic, and can be addressed at the author's discretion.
Limitations

Gemini Code Assist is still in development and may make mistakes. Please leave feedback on any instances where it's behvaiour is incorrect, or counter productive. You can use the 👍 and 👎 buttons in the comments to provide feedback.


A tag's gentle guide,
List or regex, choose your side,
Clean rules take flight.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request introduces a tag linter to yara-x, enhancing its capabilities to enforce tag-related standards in YARA rules. The changes include modifications to the CLI, configuration, compiler, and related tests. Overall, the implementation appears well-structured and addresses the stated goals. However, some areas could benefit from further refinement and clarification.

@wxsBSD
Copy link
Contributor Author

wxsBSD commented Feb 7, 2025

I haven't added tests for this yet, but am happy to do so once you're happy with these changes.

I initially had it so that linter warnings were in their own collection, like I am doing for linter errors, but decided against that for now. It is conceptually nicer to have a list of compiler warnings and errors separate from a list of linter warnings and errors, so that they can be treated differently by different programs. I'm happy to move linter warnings into their own list if you like that idea more.

@plusvic
Copy link
Member

plusvic commented Feb 10, 2025

As part of this work I realized we have no way to return multiple linter findings in a single check call, which is necessary when checking tags, so I modified the internal API to return a Option<Vec<LinterResult> where no linter findings is represented by returning None.

I also discovered that the way the compiler works is that any linter errors are collected but we only return the first one when calling add_source() which is not ideal when you can have multiple linter errors collected up. To deal with this I added a linter_errors: Vec<CompileError> to the compiler struct so that they can be checked for separately from actual compiler errors.

I wouldn't change the compiler's behavior with respect to errors. The current behaviour is that Compiler::add_source will return the first error it encounters while compiling the source code. A single call to Compiler::add_source ] can produce more than one error, and add them to the list that is obtained later via Compiler::errors(), but the compiler doesn't make any guarantees about finding all possible errors. When an error occurs, the compiler can abort immediately and refuse to keep compiling the current set of rules (in must cases that's what it does). So, in the case of the tags linter, if more than one tag fails to comply with the requirements, it's ok if the compiler only returns the first error. Also, I wouldn't make distinctions between compiler errors and linter errors, I prefer treating all errors in the same way, I think it's simpler and easier to understand that way. Telling a linter to produce errors means that it produces compiling errors as the compiler would do in all other cases.

Warnings are a different story. It's true that the current API doesn't allow a linter to produce multiple warnings, and that's a limitation. But instead of returning Option<Vec<LinterResult>> from the linter,
I would add another variant to LinterResult, we alredy have Ok, Warn and Err, we could also add Warns(Vec<Warnings>), for cases in which the linter wants to return more than one warning. The linter can still using the Warn variant of returning a single warning.

Address API concerns from Victor. The compiler now returns only the first linter
error, but still returns multiple warnings if they are found.
@wxsBSD
Copy link
Contributor Author

wxsBSD commented Feb 10, 2025

Ok, I think I've addressed your initial round of comments. I'll add a couple of tests and this should be ready for review.

While here, fix an incorrect capitalization in a warning message so it is
consistent with other warnings.
Move the Tags linter creation so that it is similar to the other linters. Now to
create a Tags linter you use `linters::tags_allowed()` or
`linters::tag_regex()`, which brings it in line with how other linters are
created.
@plusvic plusvic enabled auto-merge (squash) February 12, 2025 16:36
@plusvic plusvic merged commit 57dde57 into VirusTotal:main Feb 12, 2025
15 checks passed
@wxsBSD wxsBSD deleted the tag_lint branch February 12, 2025 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants