-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas to improve "yr fmt" #192
Comments
I like those and add a few more:
vs
vs
My plan was having a TOML configuration file not only for the code formatter, but for the CLI as a whole, where code formatting is a section. This crate could be useful for finding the user's home directory: https://crates.io/crates/home |
I haven't looked into it yet, but if possible I think all of this should be done in a library so it can be easily used by other tools. This way we get consistent formatting behavior across a variety of tools if they all use the library. I don't know what the equivalent of "libyara" is in YARA-X but it would make sense to expose it there, I think. |
Yes, of course. The config file will be used only by the CLI, for setting the right configuration for the yara-x-fmt crate. The |
Control over 'segmentation' of hex identifiers. Seems reasonable to me that someone might want to have control over newline behavior in hex identifiers, and a rational default of 16 seems non-contemptible to settle on for this behavior (if not already implemented.) |
Introduction of line breaks at appropriate places, specially in the case of rule condition, is a pending improvement too. That's a bit more complicated, it's not a low hanging fruit like the rest. At his moment the formatter let you control how the lines are broken. |
I just spotted this nd thought I'd add a few things we implement externally to YARA that might be worth considering in a tool like this:
I'm sure there's more, but I can't think of any right now. |
This one is veering more into what I would consider a linter than a formatter. Typically, as long as the rule is syntactically valid the formatter should not care about the specific content of the rule. However, with that said I'm thinking it may be useful to combine this idea into the formatter. I'm thinking of a config that would be like (python-like syntax below, I'd have to figure out how to do it in toml):
The idea would be a list of tuples that define the field name and required type. The linter could then use this to ensure the fields exist and are of the correct type. The default would have to be an empty list. The idea of bringing in linter ideas here can also be extended to other interesting things that people like to do when writing rules. We could have something like:
With this you could define a regex that must match the rule name. Some places encode things in the rule name and having a way to enforce it in the linter would be nice. What are your thoughts on these @plusvic?
I've added this and the required fields one to the list at the top of this thread for tracking. |
Formatting rule conditions is indeed more complex. I've outlined the best approach to handle this in my YARA Style Guide. For instance, many beginners tend to place "and" or "or" at the end of a line:
However, through experience, I've found that placing these operators at the start of the line improves readability and reduces the time required to understand the condition:
This is trickier than it might seem:
The challenge is that the prefix might not always be obvious. For instance, is it "s," "sa," or "ss"? It's not always straightforward to apply consistent line breaks.
I wouldn’t recommend sorting metadata alphabetically. While it might seem neat, it’s more important to order metadata by relevance. Just as on an ID card or CV, the key details should appear first. For example, description should come before anything else, and less critical information like hash values can go towards the end. Sorting alphabetically would place less relevant fields like author at the top, even though they are not essential for understanding the rule.
While my style guide provides recommendations for naming YARA rules, I’m hesitant to enforce a strict naming convention. Internally, our team does apply such standards to maintain consistency in our rule sets. However, outside of a specific team or context, I’d prefer to offer guidance rather than impose mandatory enforcement. Consistency is valuable, but flexibility also has its place. |
Hello! I think that for any formatter these things can be optional, so even if some people might not use them, I think having the option to use them isn't necessarily a bad thing. To give some rationale for some of the points discussed above:
As in Python (isort), this makes it easiest to figure out what metadata is or is not present, and means I can keyword scan down the left. e.g. I'm looking for an "author" field, I'd quickly find its position regardless of who wrote the rule.
I think it's OK either way. The main thing for me for with this is how easy it is to comment out parts of the condition to test the rule without them. Perhaps due to muscle-memory I find the EOL operator easier to read. In the example you listed where the rule consists of a series of Cheers, |
I haven't dug into it in detail yet but as discussed elsewhere, we should add various options to allow people to control the formatting (and even other things, like variable names, maybe?) in "yr fmt" - I'm happy to do the work here but want to open this issue to get input on it.
For example, I'm thinking of things like
rule foo {\n
vsrule foo\n{
)for any i in (0..1): (\n
vsfor any i in (0..1):\n(
)I'm sure there are plenty more here to be found, especially as I dig into the code, but hoping to collect your thoughts @plusvic (and others) on ways to improve "yr fmt" so I can go off and start implementing.
EDIT: I'm putting the things mentioned below into the list above so I can edit it to mark the ones I have done.
The text was updated successfully, but these errors were encountered: