diff --git a/doc/format.md b/doc/format.md index 09a4db37..8d1256a6 100644 --- a/doc/format.md +++ b/doc/format.md @@ -43,10 +43,24 @@ We'll start at the high level structure and then dig into the logic structures a - [rule format](#rule-format) - [yaml](#yaml) - [meta block](#meta-block) + - [rule name](#rule-name) + - [rule namespace](#rule-namespace) + - [analysis flavors](#analysis-flavors) - [features block](#features-block) - [extracted features](#extracted-features) - - [characteristic](#characteristic) - - [instruction features](#instruction-features) + - [static analysis scopes](#static-analysis-scopes) + - [instruction features](#instruction-features) + - [basic block features](#basic-block-features) + - [function features](#function-features) + - [dynamic analysis scopes](#dynamic-analysis-scopes) + - [call features](#call-features) + - [thread features](#thread-features) + - [process features](#process-features) + - [common scopes](#common-scopes) + - [file features](#file-features) + - [global features](#global-features) + - [complete feature listing](#complete-feature-listing) + - [characteristic](#characteristic) - [namespace](#namespace) - [class](#class) - [api](#api) @@ -57,9 +71,6 @@ We'll start at the high level structure and then dig into the logic structures a - [offset](#offset) - [mnemonic](#mnemonic) - [operand](#operand) - - [basic block features](#basic-block-features) - - [function features](#function-features) - - [file features](#file-features) - [string and substring](#file-string-and-substring) - [export](#export) - [import](#import) @@ -67,7 +78,6 @@ We'll start at the high level structure and then dig into the logic structures a - [function-name](#function-name) - [namespace](#namespace) - [class](#class) - - [global features](#global-features) - [os](#os) - [arch](#arch) - [format](#format) @@ -110,19 +120,20 @@ meta: name: packed with UPX namespace: anti-analysis/packer/upx authors: - - william.ballenthin@mandiant.com + - william.ballenthin@mandiant.com description: the sample appears to be packed with UPX - scope: file + scopes: + static: file + dynamic: file att&ck: - Defense Evasion::Obfuscated Files or Information [T1027.002] mbc: - - Anti-Static Analysis::Software Packing + - Anti-Static Analysis::Software Packing examples: - CD2CBA9E6313E8DF2C1273593E649682 - Practical Malware Analysis Lab 01-02.exe_:0x0401000 ``` - Here are the common fields: - `name` is required. This string should uniquely identify the rule. More details below. @@ -133,14 +144,19 @@ Here are the common fields: - `description` is optional text that describes the intent or interpretation of the rule. - - `scope` indicates to which feature set this rule applies. - Here are the legal values: - - **`file`**: matches features across the whole file. - - **`function`** (default): match features within each function. - - **`basic block`**: matches features within each basic block. - This is used to achieve locality in rules (for example for parameters of a function). - - **`instruction`**: matches features found at a single instruction. - This is great to identify structure access or comparisons against magic constants. + - `scopes` indicates which feature set the rule applies to, when analyzing static or dynamic analysis artifacts. There are two required sub fields: `static` and `dynamic`. Here are the legal values: + - `scopes.static`: + - **`instruction`**: matches features found at a single instruction. + This is great to identify structure access or comparisons against magic constants. + - **`basic block`**: matches features within each basic block. + This is used to achieve close locality in rules (for example for parameters of a function). + - **`function`**: match features within each function. + - **`file`**: matches features across the whole file. + - `scopes.dynamic`: + - **`call`**: match features at each traced API call site, such as API name and argument values. + - **`thread`**: match features within each thread, such as sequence of API names. + - **`process`**: match features within each process. + - **`file`**: matches features across the whole file, including from the executable file features *and* across the entire runtime trace. - `att&ck` is an optional list of [ATT&CK framework](https://attack.mitre.org/) techniques that the rule implies, like `Discovery::Query Registry [T1012]` or `Persistence::Create or Modify System Process::Windows Service [T1543.003]`. @@ -252,6 +268,64 @@ rules/host-interaction/file-system/list The depth of the namespace tree is not limited, but we've found that 3-4 components is typically sufficient. +### analysis flavors + +capa analyzes capabilities found in both executable files and in API traces captured by sandboxes, such as CAPE. +We call these categories of analysis "flavors" and use "static analysis flavor" and "dynamic analysis flavor" to refer to them, respectively. Static analysis is great for reviewing the entire logic of a program and finding the interesting regions. Dynamic analysis via sandboxes helps bypass packing, which is very widespread in malware, and can better describe the actual runtime behavior of a program. We use the `meta.scopes.$flavor` key to specify how a rule interacts with a particular flavor. + +When possible, we try to write capa rules that work in both static and dynamic analysis flavors. +For example, here's a rule that matches in both flavors: + +```yml +rule: + meta: + name: create mutex + namespace: host-interaction/mutex + authors: + - moritz.raabe@mandiant.com + - michael.hunhoff@mandiant.com + scopes: + static: function + dynamic: call + features: + - or: + - api: kernel32.CreateMutex + - api: kernel32.CreateMutexEx + - api: System.Threading.Mutex::ctor +``` + +See how `create mutex` can be reasoned about both by inspecting the disassembly features (static analysis) as well as the runtime API trace (dynamic analysis)? + +On the other hand, some behaviors are best described by rules that work in only one scope. +Remember, its paramount that rules be human-readable, so avoid complicating logic for the sake of merging rules. +In this case, mark the excluded scope with `unsupported`, like in the following rule: + +```yml +rule: + meta: + name: check for software breakpoints + namespace: anti-analysis/anti-debugging/debugger-detection + authors: + - michael.hunhoff@mandiant.com + scopes: + static: function + dynamic: unsupported # requires mnemonic features + features: + - and: + - or: + - instruction: + - mnemonic: cmp + - number: 0xCC = INT3 + - match: contain loop +``` + +`check for software breakpoints` works great during disassembly analysis, where low-level instruction features can be matched, but doesn't work in dynamic scopes because these features aren't available. Hence, we mark the rule `scopes.dynamic: unsupported` so the rule won't be considered when processing sandbox traces. + +As you'll see in the [extracted features](#extracted-features) section, capa matches features at various scopes, starting small (e.g., `instruction`) and growing large (e.g., `file`). In static analysis, scopes grow from `instruction`, to `basic block`, `function`, and then `file`. In dynamic analysis, scopes grow from `call`, to `thread`, `process`, and then to `file`. + +When matching a sequence of API calls, the static scope is often `function` and the dynamic scope is `thread`. When matching a single API call with arguments, the static scope is usually `basic block` and the dynamic scope is `call`. One day we hope to support `call` scope directly in the static analysis flavor. + + ## features block This section declares logical statements about the features that must exist for the rule to match. @@ -288,49 +362,53 @@ If only one of these features is found in a function, the rule will not match. # extracted features -capa extracts features from multiple scopes, starting with the most specific (instruction) and working towards the most general: +capa matches features at multiple scopes, starting small (e.g., `instruction`) and growing large (e.g., `file`). In static analysis, scopes grow from `instruction`, to `basic block`, `function`, and then `file`. In dynamic analysis, scopes grow from `call`, to `thread`, `process`, and then to `file`: + +| static scope | best for... | +|--------------|------------------------------------------------------------------------------------------| +| instruction | specific combinations of mnemonics, operands, constants, etc. to find magic values | +| basic block | closely related instructions, such as structure access or function call arguments | +| function | collections of API calls, constants, etc. that suggest complete capabilities | +| file | high level conclusions, like encryptor, backdoor, or statically linked with some library | +| global | the features available at every scope, like architecture or OS | -| scope | best for... | -|-------------|------------------------------------------------------------------------------------------| -| instruction | specific combinations of mnemonics, operands, constants, etc. to find magic values | -| basic block | closely related instructions, such as structure access or function call arguments | -| function | collections of API calls, constants, etc. that suggest complete capabilities | -| file | high level conclusions, like encryptor, backdoor, or statically linked with some library | -| (global) | the features available at every scope, like arch or OS | +| dynamic scope | best for... | +|---------------|------------------------------------------------------------------------------------------| +| call | single API call and its arguments | +| thread | sequence of related API calls | +| process | combinations of other capabilities found within a (potentially multi-threaded) program | +| file | high level conclusions, like encryptor, backdoor, or statically linked with some library | +| global | the features available at every scope, like architecture or OS | In general, capa collects and merges the features from lower scopes into higher scopes; for example, features extracted from individual instructions are merged into the function scope that contains the instructions. This way, you can use the match results against instructions ("the constant X is for crypto algorithm Y") to recognize function-level capabilities ("crypto function Z"). - -### characteristic - -Characteristics are features that are extracted by the analysis engine. -They are one-off features that seem interesting to the authors. - -For example, the `characteristic: nzxor` feature describes non-zeroing XOR instructions. - -| characteristic | scope | description | -|--------------------------------------|------------------------------------|-------------| -| `characteristic: embedded pe` | file | (XOR encoded) embedded PE files. | -| `characteristic: forwarded export` | file | PE file has a forwarded export. | -| `characteristic: mixed mode` | file | File contains both managed and unmanaged (native) code, often seen in .NET | -| `characteristic: loop` | function | Function contains a loop. | -| `characteristic: recursive call` | function | Function is recursive. | -| `characteristic: calls from` | function | There are unique calls from this function. Best used like: `count(characteristic(calls from)): 3 or more` | -| `characteristic: calls to` | function | There are unique calls to this function. Best used like: `count(characteristic(calls to)): 3 or more` | -| `characteristic: tight loop` | basic block, function | A tight loop where a basic block branches to itself. | -| `characteristic: stack string` | basic block, function | There is a sequence of instructions that looks like stack string construction. | -| `characteristic: nzxor` | instruction, basic block, function | Non-zeroing XOR instruction | -| `characteristic: peb access` | instruction, basic block, function | Access to the process environment block (PEB), e.g. via fs:[30h], gs:[60h] | -| `characteristic: fs access` | instruction, basic block, function | Access to memory via the `fs` segment. | -| `characteristic: gs access` | instruction, basic block, function | Access to memory via the `gs` segment. | -| `characteristic: cross section flow` | instruction, basic block, function | Function contains a call/jump to a different section. This is commonly seen in unpacking stubs. | -| `characteristic: indirect call` | instruction, basic block, function | Indirect call instruction; for example, `call edx` or `call qword ptr [rsp+78h]`. | -| `characteristic: call $+5` | instruction, basic block, function | Call just past the current instruction. | -| `characteristic: unmanaged call` | instruction, basic block, function | Function contains a call from managed code to unmanaged (native) code, often seen in .NET | - -## instruction features +| feature | static scope | dynamic scope | +|-----------------------------------|---------------------------------------------|--------------------------------| +| [api](#api) | instruction ↦ basic block ↦ function ↦ file | call ↦ thread ↦ process ↦ file | +| [string](#string-and-substring) | instruction ↦ ... | call ↦ ... | +| [bytes](#bytes) | instruction ↦ ... | call ↦ ... | +| [number](#number) | instruction ↦ ... | call ↦ ... | +| [characteristic](#characteristic) | instruction ↦ ... | - | +| [mnemonic](#mnemonic) | instruction ↦ ... | - | +| [operand](#operand) | instruction ↦ ... | - | +| [offset](#offset) | instruction ↦ ... | - | +| [com](#com) | instruction ↦ ... | - | +| [namespace](#namespace) | instruction ↦ ... | - | +| [class](#class) | instruction ↦ ... | - | +| [property](#property) | instruction ↦ ... | - | +| [export](#export) | file | file | +| [import](#import) | file | file | +| [section](#section) | file | file | +| [function-name](#function-name) | file | - | +| [os](#os) | global | global | +| [arch](#arch) | global | global | +| [format](#format) | global | global | + +## static analysis scopes + +### instruction features Instruction features stem from individual instructions, such as mnemonics, string references, or function calls. The following features are relevant at this scope and above: @@ -357,6 +435,109 @@ Also, the following [characteristics](#characteristic) are relevant at this scop - `call $+5` - `unmanaged call` +### basic block features + +Basic block features stem from combinations of features from the instruction scope that are found within the same basic block. + +Also, the following [characteristics](#characteristic) are relevant at this scope and above: + - `tight loop` + - `stack string` + +### function features + +Function features stem from combinations of features from the instruction and basic block scopes that are found within the same function. + +Also, the following [characteristics](#characteristic) are relevant at this scope and above: + - `loop` + - `recursive call` + - `calls from` + - `calls to` + +## dynamic analysis scopes + +### call features + +Call features are collected from individual sandbox trace events, such as API calls. +They're typically useful for matching against the API name and arguments (strings or integer constants). + +The following features are relevant at this scope and above: + + - [api](#api) + - [number](#number) + - [string and substring](#string-and-substring) + - [bytes](#bytes) + +### thread features + +Thread features stem from combinations of features from the call scopes that are found within the same thread. +This is useful for matching a sequence of API calls, such as `OpenFile`/`ReadFile`/`CloseFile`. + +There are no thread-specific features. + +### process features + +Process features are combinations of features from the thread scopes found within the same process. +This is useful for matching behaviors found across an entire program, even if its multi-threaded. + +There are no process-specific features. + +## common scopes + +### file features + +File features stem from the file structure, i.e. PE structure or the raw file data. + +Also, all features found in all functions (static) or all processes (dynamic) are collected into the file scope. + +The following features are supported at this scope: + + - [string and substring](#file-string-and-substring) + - [export](#export) + - [import](#import) + - [section](#section) + - [function-name](#function-name) + - [namespace](#namespace) + - [class](#class) + +### global features + +Global features are extracted at all scopes. +These are features that may be useful to both disassembly and file structure interpretation, such as the targeted OS or architecture. +The following features are supported at this scope: + + - [os](#os) + - [arch](#arch) + - [format](#format) + +## complete feature listing + +### characteristic + +Characteristics are features that are extracted by the analysis engine. +They are one-off features that seem interesting to the authors. + +For example, the `characteristic: nzxor` feature describes non-zeroing XOR instructions. + +| characteristic | scope | description | +|--------------------------------------|------------------------------------|-----------------------------------------------------------------------------------------------------------| +| `characteristic: embedded pe` | file | (XOR encoded) embedded PE files. | +| `characteristic: forwarded export` | file | PE file has a forwarded export. | +| `characteristic: mixed mode` | file | File contains both managed and unmanaged (native) code, often seen in .NET | +| `characteristic: loop` | function | Function contains a loop. | +| `characteristic: recursive call` | function | Function is recursive. | +| `characteristic: calls from` | function | There are unique calls from this function. Best used like: `count(characteristic(calls from)): 3 or more` | +| `characteristic: calls to` | function | There are unique calls to this function. Best used like: `count(characteristic(calls to)): 3 or more` | +| `characteristic: tight loop` | basic block, function | A tight loop where a basic block branches to itself. | +| `characteristic: stack string` | basic block, function | There is a sequence of instructions that looks like stack string construction. | +| `characteristic: nzxor` | instruction, basic block, function | Non-zeroing XOR instruction | +| `characteristic: peb access` | instruction, basic block, function | Access to the process environment block (PEB), e.g. via fs:[30h], gs:[60h] | +| `characteristic: fs access` | instruction, basic block, function | Access to memory via the `fs` segment. | +| `characteristic: gs access` | instruction, basic block, function | Access to memory via the `gs` segment. | +| `characteristic: cross section flow` | instruction, basic block, function | Function contains a call/jump to a different section. This is commonly seen in unpacking stubs. | +| `characteristic: indirect call` | instruction, basic block, function | Indirect call instruction; for example, `call edx` or `call qword ptr [rsp+78h]`. | +| `characteristic: call $+5` | instruction, basic block, function | Call just past the current instruction. | +| `characteristic: unmanaged call` | instruction, basic block, function | Function contains a call from managed code to unmanaged (native) code, often seen in .NET | + ### namespace A named namespace used by the logic of the program. @@ -377,6 +558,9 @@ Example: class: System.IO.File class: System.Net.WebResponse +Example rule: [create new application domain in .NET](../host-interaction/memory/create-new-application-domain-in-dotnet.yml) + + ### api A call to a named function, probably an import, though possibly a local function (like `malloc`) extracted via function signature matching like FLIRT. @@ -398,6 +582,8 @@ Example: api: System.Net.WebResponse::GetResponseStream api: System.Threading.Mutex::ctor # match creation System.Threading.Mutex object +Example rule: [switch active desktop](../host-interaction/gui/switch-active-desktop.yml) + ### property A member of a class or structure used by the logic of a program. This must include the member's class and namespace if recoverable. @@ -408,6 +594,8 @@ Example: property/read: System.Environment::OSVersion property/write: System.Net.WebRequest::Proxy +Example rule: [enumere GUI resources](../host-interaction/gui/enumerate-gui-resources.yml) + ### number A number used by the logic of the program. This should not be a stack or structure offset. @@ -436,6 +624,8 @@ If the number is only relevant on a particular architecture, don't hesitate to u - number: 4 = size of pointer ``` +Example rule: [get disk size](../host-interaction/hardware/storage/get-disk-size.yml) + ### string and substring A string referenced by the logic of the program. This is probably a pointer to an ASCII or Unicode string. @@ -480,6 +670,8 @@ Examples: Note that regex and substring matching is expensive (`O(features)` rather than `O(1)`) so they should be used sparingly. +Example rule: [identify ATM dispenser service provider](../targeting/automated-teller-machine/identify-atm-dispenser-service-provider.yml) + ### bytes A sequence of bytes referenced by the logic of the program. The provided sequence must match from the beginning of the referenced bytes and be no more than `0x100` bytes. @@ -503,6 +695,8 @@ Example rule elements: bytes: 01 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = CLSID_ShellLink bytes: EE 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = IID_IShellLink +Example rule: [hash data using Whirlpool](../nursery/hash-data-using-whirlpool.yml) + ### com COM features represent Component Object Model (COM) interfaces and classes used in the program's logic. They help identify interactions with COM objects, methods, properties, and interfaces. The parameter is the name of the COM class or interface. This feature allows you to list human-readable names instead of the byte representations found in the program. @@ -565,7 +759,8 @@ Examples: mnemonic: xor mnemonic: shl - + +Example rule: [check for trap flag exception](../anti-analysis/anti-debugging/debugger-detection/check-for-trap-flag-exception.yml) ### operand @@ -577,36 +772,7 @@ Examples: operand[0].number: 0x10 operand[1].offset: 0x2C -## basic block features -Basic block features stem from combinations of features from the instruction scope that are found within the same basic block. - -Also, the following [characteristics](#characteristic) are relevant at this scope and above: - - `tight loop` - - `stack string` - - -## function features -Function features stem from combinations of features from the instruction and basic block scopes that are found within the same function. - -Also, the following [characteristics](#characteristic) are relevant at this scope and above: - - `loop` - - `recursive call` - - `calls from` - - `calls to` - - -## file features - -File features stem from the file structure, i.e. PE structure or the raw file data. -The following features are supported at this scope: - - - [string and substring](#file-string-and-substring) - - [export](#export) - - [import](#import) - - [section](#section) - - [function-name](#function-name) - - [namespace](#namespace) - - [class](#class) +Example rule: [encrypt data using XTEA](../data-manipulation/encryption/xtea/encrypt-data-using-xtea.yml) ### file string and substring @@ -639,6 +805,8 @@ To specify a [forwarded export](https://devblogs.microsoft.com/oldnewthing/20060 export: "c:/windows/system32/version.GetFileVersionInfoA" export: "vresion.GetFileVersionInfoA" +Example rule: [act as password filter DLL](../persistence/authentication-process/act-as-password-filter-dll.yml) + ### import The name of a routine imported from a shared library. These can include DLL names that are checked during matching. @@ -650,6 +818,8 @@ Examples: import: kernel32.#22 # by ordinal import: System.IO.File::Exists +Example rule: [load NCR ATM library](../targeting/automated-teller-machine/ncr/load-ncr-atm-library.yml) + ### function-name The name of a recognized statically-linked library, such as recovered via FLIRT, or a name extracted from information contained in the file, such as .NET metadata. @@ -660,6 +830,8 @@ Examples: function-name: "?FillEncTable@Base@Rijndael@CryptoPP@@KAXXZ" function-name: Malware.Backdoor::Beacon +Example rule: [execute via .NET startup hook](../runtime/dotnet/execute-via-dotnet-startup-hook.yml) + ### section The name of a section in a structured file. @@ -669,15 +841,7 @@ Examples: section: .rsrc -## global features - -Global features are extracted at all scopes. -These are features that may be useful to both disassembly and file structure interpretation, such as the targeted OS or architecture. -The following features are supported at this scope: - - - [os](#os) - - [arch](#arch) - - [format](#format) +Example rule: [compiled with DMD](../compiler/d/compiled-with-dmd.yml) ### os @@ -727,6 +891,8 @@ Valid OSes: Note: you can match any valid OS by not specifying an `os` feature or by using `any`, e.g. `- os: any`. +Example rule: [discover group policy via gpresult](../collection/group-policy/discover-group-policy-via-gpresult.yml) + ### arch The name of the CPU architecture on which the sample runs. @@ -769,6 +935,8 @@ However, this can be useful if you have groups of many architecture-specific off This can be easier to understand than using many `offset/x32` or `offset/x64` features. +Example rule: [get process heap flags](../host-interaction/process/get-process-heap-flags.yml) + ### format The name of the file format. @@ -778,6 +946,8 @@ Valid formats: - `elf` - `dotnet` +Example rule: [access .NET resource](../executable/resource/access-dotnet-resource.yml) + ## counting Many rules will inspect the feature set for a select combination of features;