Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encode-data-using-add-xor-sub-operations.yml #800

Merged
merged 3 commits into from
Nov 22, 2023

Conversation

jtothej
Copy link
Contributor

@jtothej jtothej commented Jul 15, 2023

Add encode-data-using-add-xor-sub-operations.yml
The rule feels a bit brute force-y but I'm not sure if there's more elegant way to do it.

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea. i'm also not sure if this is the right way to express the rule. can you imagine a different syntax or other feature that would let you express this more easily? could we introduce another characteristic that would enable this?

- and:
- characteristic: tight loop
- instruction:
- mnemonic: xor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be replaced with characteristic(nzxor) (though technically that also matches xor reg,reg too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this rule I wanted to express the following logic: require ADD, SUB and XOR instructions to be present in a single basic block and require immediate value operand for ADD, SUB and XOR instructions to be within (0x1,0xff) range.

characteristic: nzxor could work here but I chose to be as specific as possible - code I'm trying to match typically looks like this:

add     dl, 4Fh
xor     dl, 0F1h
sub     dl, 4Fh

One thing that I think could make the rule a bit more cleaner would be support for ranges or list of values by the number feature, so e.g.

- and:
  - characteristic: tight loop
  - instruction:
    - mnemonic: xor
    - operand[1].number: (0x1,0xff)
  - instruction:
    - mnemonic: add
    - operand[1].number: (0x1,0xff)
  - instruction:
    - mnemonic: sub
    - operand[1].number: (0x1,0xff)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this work as an approximation?

- not:
  - operand[1].number: 0

this would include larger values though, so may be too broad?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code samples I've seen use single byte values as ADD/XOR/SUB operands so any non-zero value might be too broad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, ok, I'm not sure how to best proceed here then without changing the rule syntax (not worth it ATM IMHO) or just merging as is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williballenthin are we worried about performance here? @jtothej could we add other features to quickly filter out likely FPs (large functions, etc.)?

Copy link
Collaborator

@williballenthin williballenthin Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-tz i agree that its probably not worth changing the rule syntax for a single rule today. if we can come up with other places the syntax change is useful then lets reconsider.

@mr-tz i am somewhat worried about performance. i think this can be assessed by running the test cases before/after this PR and seeing if there's any meaningful delta. i wouldn't be surprised if my intuition is wrong about performance, so this shouldn't be a blocker without real data. sorry @jtothej for bogging down the PR merge with this worry.

@jtothej i think we should consider removing some common, small values from the ADD/SUB branches to account for iteration over an array, like bytewise processing of a string, wstring, etc. So maybe remove 1/2/4/8/16 from ADD/SUB branches. thoughts?

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a review of the test case matches (assume it works) and other matches in our test corpus (TODO), then a sanity check on performance impact, then im ok with this being merged.

Comment on lines 279 to 280
- operand[1].number: 0x01
- operand[1].number: 0x02
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- operand[1].number: 0x01
- operand[1].number: 0x02

common string/wstring index deltas, remove to avoid FPs?

note this is based on intuition, so feel free to push back/discuss.

- operand[1].number: 0x01
- operand[1].number: 0x02
- operand[1].number: 0x03
- operand[1].number: 0x04
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- operand[1].number: 0x04

common array index deltas, remove to avoid FPs?

note this is based on intuition, so feel free to push back/discuss.

- operand[1].number: 0x05
- operand[1].number: 0x06
- operand[1].number: 0x07
- operand[1].number: 0x08
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- operand[1].number: 0x08

common array index deltas, remove to avoid FPs?

note this is based on intuition, so feel free to push back/discuss.

Comment on lines 537 to 538
- operand[1].number: 0x01
- operand[1].number: 0x02
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- operand[1].number: 0x01
- operand[1].number: 0x02

common string/wstring index deltas, remove to avoid FPs?

note this is based on intuition, so feel free to push back/discuss.

@mr-tz
Copy link
Collaborator

mr-tz commented Nov 20, 2023

according to offline discussions, Willi's suggestions above or this approximation both work also

  features:
    - and:
      - count(basic blocks): 6 or fewer
      - basic block:
        - and:
          - characteristic: tight loop
          - characteristic: nzxor
          - count(mnemonic(add)): 1
          - count(mnemonic(sub)): 1

@mr-tz mr-tz merged commit 6ad4499 into mandiant:master Nov 22, 2023
3 checks passed
@mr-tz
Copy link
Collaborator

mr-tz commented Nov 22, 2023

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants