From 1d3dd9a6622af7f598dd3e1d2729bb97b1c63f6d Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Tue, 28 Nov 2023 13:25:34 +0000 Subject: [PATCH 01/15] wip: update rule format documentation with dynamic details --- doc/format.md | 90 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 79 insertions(+), 11 deletions(-) diff --git a/doc/format.md b/doc/format.md index 09a4db37..2906a895 100644 --- a/doc/format.md +++ b/doc/format.md @@ -43,6 +43,9 @@ We'll start at the high level structure and then dig into the logic structures a - [rule format](#rule-format) - [yaml](#yaml) - [meta block](#meta-block) + - [rule name](#rule-name) + - [rule namespace](#rule-namespace) + - [analysis flavors](#analysis-flavors) - [features block](#features-block) - [extracted features](#extracted-features) - [characteristic](#characteristic) @@ -112,7 +115,9 @@ meta: authors: - william.ballenthin@mandiant.com description: the sample appears to be packed with UPX - scope: file + scopes: + static: file + dynamic: file att&ck: - Defense Evasion::Obfuscated Files or Information [T1027.002] mbc: @@ -133,14 +138,19 @@ Here are the common fields: - `description` is optional text that describes the intent or interpretation of the rule. - - `scope` indicates to which feature set this rule applies. - Here are the legal values: - - **`file`**: matches features across the whole file. - - **`function`** (default): match features within each function. - - **`basic block`**: matches features within each basic block. - This is used to achieve locality in rules (for example for parameters of a function). - - **`instruction`**: matches features found at a single instruction. - This is great to identify structure access or comparisons against magic constants. + - `scopes` indicates which feature set the rule applies to, when analyzing static or dynamic analysis artifacts. There are two required sub fields: `static` and `dynamic`. Here are the legal values: + - `scopes.static`: + - **`file`**: matches features across the whole file. + - **`function`** (default): match features within each function. + - **`basic block`**: matches features within each basic block. + This is used to achieve locality in rules (for example for parameters of a function). + - **`instruction`**: matches features found at a single instruction. + This is great to identify structure access or comparisons against magic constants. + - `scopes.dynamic`: + - **`file`**: matches features across the whole file, including from the executable file features *and* across the entire runtime trace. + - **`process`**: match features within each process. + - **`thread`**: match features within each thread, such as sequence of API names. + - **`call`**: match features at each traced API call site, such as API name and argument values. - `att&ck` is an optional list of [ATT&CK framework](https://attack.mitre.org/) techniques that the rule implies, like `Discovery::Query Registry [T1012]` or `Persistence::Create or Modify System Process::Windows Service [T1543.003]`. @@ -252,6 +262,35 @@ rules/host-interaction/file-system/list The depth of the namespace tree is not limited, but we've found that 3-4 components is typically sufficient. +### analysis flavors + +capa analyzes capabilities found in both executable files and in API traces captured by sandboxes, such as CAPE. +We call these categories of analysis "flavors" and use "static analysis flavor" and "dynamic analysis flavor" to refer to them, respectively. Static analysis is great for reviewing the entire logic of a program and finding the interesting regions. Dynamic analysis via sandboxes helps bypass packing, which is very widespread in malware, and can better describe the actual runtime behavior of a program. We use the `meta.scopes.$flavor` key to specify how a rule interacts with a particular flavor. + +When possible, we try to write capa rules that work in both static and dynamic analysis flavors. +For example, here's a rule that matches in both flavors: + +```yml +TODO +``` + +See how XYZ can be reasoned about both by inspecting the disassembly features (static analysis) as well as the runtime API trace (dynamic analysis)? TODO + +On the other hand, some behaviors are best described by rules that work in only one scope. +(Remember, its paramount that rules be human-readable, so avoid complicating logic for the sake of merging rules.) +In this case, mark the excluded scope with `unsupported`, like in the following rule: + +```yml +TODO +``` + +ABC works great becauses of DEF, but doesn't work in GHI scope because of JKL. TODO. + +As you'll see in the [extracted features](#extracted-features) section, capa matches features at various scopes, starting small (e.g., instruction) and growing large (e.g., file). In static analysis, scopes grow from instruction, to basic block, function, and then file. In dynamic analysis, scopes from call, to thread, process, and then to file. + +When matching a sequence of API calls, the static scope is often "function" and the dynamic scope is "thread". When matching a single API call with arguments, the static scope is usually "basic block" and the dynamic scope is "call". One day we hope to support "call" scope directly in the static analysis flavor. + + ## features block This section declares logical statements about the features that must exist for the rule to match. @@ -288,20 +327,45 @@ If only one of these features is found in a function, the rule will not match. # extracted features -capa extracts features from multiple scopes, starting with the most specific (instruction) and working towards the most general: +capa matches features at multiple scopes, starting small (e.g., instruction) and growing large (e.g., file). In static analysis, scopes grow from instruction, to basic block, function, and then file. In dynamic analysis, scopes from call, to thread, process, and then to file: | scope | best for... | |-------------|------------------------------------------------------------------------------------------| +| (static) | --- | | instruction | specific combinations of mnemonics, operands, constants, etc. to find magic values | | basic block | closely related instructions, such as structure access or function call arguments | | function | collections of API calls, constants, etc. that suggest complete capabilities | +| (dynamic) | --- | +| call | single API call and its arguments | +| thread | sequence of related API calls | +| process | combinations of other capabilities found within a (potentially multi-threaded) program | +| (common) | --- | | file | high level conclusions, like encryptor, backdoor, or statically linked with some library | -| (global) | the features available at every scope, like arch or OS | +| global | the features available at every scope, like arch or OS | In general, capa collects and merges the features from lower scopes into higher scopes; for example, features extracted from individual instructions are merged into the function scope that contains the instructions. This way, you can use the match results against instructions ("the constant X is for crypto algorithm Y") to recognize function-level capabilities ("crypto function Z"). +## complete feature listing + +TODO: make this table complete, with links + + feature static dynamic + --------- ------------ ------- + api instruction call + number instruction call + string instruction call + bytes instruction call + offset instruction - + mnemonic instruction - + operand instruction - + import file file + export file file + os global global + arch global global + format global global + ### characteristic @@ -310,6 +374,8 @@ They are one-off features that seem interesting to the authors. For example, the `characteristic: nzxor` feature describes non-zeroing XOR instructions. +TODO: add links to rules with each of these characteristics. + | characteristic | scope | description | |--------------------------------------|------------------------------------|-------------| | `characteristic: embedded pe` | file | (XOR encoded) embedded PE files. | @@ -367,6 +433,8 @@ Example: namespace: System.IO namespace: System.Net +TODO: add reference to rule with this feature, and for all other features. + ### class A named class used by the logic of the program. This must include the class's namespace if recoverable. From 45f758ed58208443a77427ee4f77d94bf0aa91df Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 12:33:12 +0000 Subject: [PATCH 02/15] format: add example links --- doc/format.md | 149 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 104 insertions(+), 45 deletions(-) diff --git a/doc/format.md b/doc/format.md index 2906a895..a05ae728 100644 --- a/doc/format.md +++ b/doc/format.md @@ -271,20 +271,49 @@ When possible, we try to write capa rules that work in both static and dynamic a For example, here's a rule that matches in both flavors: ```yml -TODO +rule: + meta: + name: create mutex + namespace: host-interaction/mutex + authors: + - moritz.raabe@mandiant.com + - michael.hunhoff@mandiant.com + scopes: + static: function + dynamic: call + features: + - or: + - api: kernel32.CreateMutex + - api: kernel32.CreateMutexEx + - api: System.Threading.Mutex::ctor ``` -See how XYZ can be reasoned about both by inspecting the disassembly features (static analysis) as well as the runtime API trace (dynamic analysis)? TODO +See how "create mutex" can be reasoned about both by inspecting the disassembly features (static analysis) as well as the runtime API trace (dynamic analysis)? On the other hand, some behaviors are best described by rules that work in only one scope. (Remember, its paramount that rules be human-readable, so avoid complicating logic for the sake of merging rules.) In this case, mark the excluded scope with `unsupported`, like in the following rule: ```yml -TODO +rule: + meta: + name: check for software breakpoints + namespace: anti-analysis/anti-debugging/debugger-detection + authors: + - michael.hunhoff@mandiant.com + scopes: + static: function + dynamic: unsupported # requires mnemonic features + features: + - and: + - or: + - instruction: + - mnemonic: cmp + - number: 0xCC = INT3 + - match: contain loop ``` -ABC works great becauses of DEF, but doesn't work in GHI scope because of JKL. TODO. +"check for software breakpoints" works great during disassembly analysis, such as mnemonic and operand matching, but doesn't work in dynamic scopes because these features aren't available. So, we mark the rule `scopes.dynamic: unsupported` so the rule won't be considered when processing sandbox traces. As you'll see in the [extracted features](#extracted-features) section, capa matches features at various scopes, starting small (e.g., instruction) and growing large (e.g., file). In static analysis, scopes grow from instruction, to basic block, function, and then file. In dynamic analysis, scopes from call, to thread, process, and then to file. @@ -347,24 +376,27 @@ In general, capa collects and merges the features from lower scopes into higher for example, features extracted from individual instructions are merged into the function scope that contains the instructions. This way, you can use the match results against instructions ("the constant X is for crypto algorithm Y") to recognize function-level capabilities ("crypto function Z"). -## complete feature listing - -TODO: make this table complete, with links - - feature static dynamic - --------- ------------ ------- - api instruction call - number instruction call - string instruction call - bytes instruction call - offset instruction - - mnemonic instruction - - operand instruction - - import file file - export file file - os global global - arch global global - format global global +| feature | static scope | dynamic scope | +|-----------------------------------|--------------|---------------| +| [api](#api) | instruction | call | +| [string](#string-and-substring) | instruction | call | +| [bytes](#bytes) | instruction | call | +| [number](#number) | instruction | call | +| [characteristic](#characteristic) | instruction | - | +| [mnemonic](#mnemonic) | instruction | - | +| [operand](#operand) | instruction | - | +| [offset](#offset) | instruction | - | +| [com](#com) | instruction | - | +| [namespace](#namespace) | instruction | - | +| [class](#class) | instruction | - | +| [property](#property) | instruction | - | +| [export](#export) | file | file | +| [import](#import) | file | file | +| [section](#section) | file | file | +| [function-name](#function-name) | file | - | +| [os](#os) | global | global | +| [arch](#arch) | global | global | +| [format](#format) | global | global | ### characteristic @@ -374,27 +406,25 @@ They are one-off features that seem interesting to the authors. For example, the `characteristic: nzxor` feature describes non-zeroing XOR instructions. -TODO: add links to rules with each of these characteristics. - -| characteristic | scope | description | -|--------------------------------------|------------------------------------|-------------| -| `characteristic: embedded pe` | file | (XOR encoded) embedded PE files. | -| `characteristic: forwarded export` | file | PE file has a forwarded export. | -| `characteristic: mixed mode` | file | File contains both managed and unmanaged (native) code, often seen in .NET | -| `characteristic: loop` | function | Function contains a loop. | -| `characteristic: recursive call` | function | Function is recursive. | +| characteristic | scope | description | +|--------------------------------------|------------------------------------|-----------------------------------------------------------------------------------------------------------| +| `characteristic: embedded pe` | file | (XOR encoded) embedded PE files. | +| `characteristic: forwarded export` | file | PE file has a forwarded export. | +| `characteristic: mixed mode` | file | File contains both managed and unmanaged (native) code, often seen in .NET | +| `characteristic: loop` | function | Function contains a loop. | +| `characteristic: recursive call` | function | Function is recursive. | | `characteristic: calls from` | function | There are unique calls from this function. Best used like: `count(characteristic(calls from)): 3 or more` | -| `characteristic: calls to` | function | There are unique calls to this function. Best used like: `count(characteristic(calls to)): 3 or more` | -| `characteristic: tight loop` | basic block, function | A tight loop where a basic block branches to itself. | -| `characteristic: stack string` | basic block, function | There is a sequence of instructions that looks like stack string construction. | -| `characteristic: nzxor` | instruction, basic block, function | Non-zeroing XOR instruction | -| `characteristic: peb access` | instruction, basic block, function | Access to the process environment block (PEB), e.g. via fs:[30h], gs:[60h] | -| `characteristic: fs access` | instruction, basic block, function | Access to memory via the `fs` segment. | -| `characteristic: gs access` | instruction, basic block, function | Access to memory via the `gs` segment. | -| `characteristic: cross section flow` | instruction, basic block, function | Function contains a call/jump to a different section. This is commonly seen in unpacking stubs. | -| `characteristic: indirect call` | instruction, basic block, function | Indirect call instruction; for example, `call edx` or `call qword ptr [rsp+78h]`. | -| `characteristic: call $+5` | instruction, basic block, function | Call just past the current instruction. | -| `characteristic: unmanaged call` | instruction, basic block, function | Function contains a call from managed code to unmanaged (native) code, often seen in .NET | +| `characteristic: calls to` | function | There are unique calls to this function. Best used like: `count(characteristic(calls to)): 3 or more` | +| `characteristic: tight loop` | basic block, function | A tight loop where a basic block branches to itself. | +| `characteristic: stack string` | basic block, function | There is a sequence of instructions that looks like stack string construction. | +| `characteristic: nzxor` | instruction, basic block, function | Non-zeroing XOR instruction | +| `characteristic: peb access` | instruction, basic block, function | Access to the process environment block (PEB), e.g. via fs:[30h], gs:[60h] | +| `characteristic: fs access` | instruction, basic block, function | Access to memory via the `fs` segment. | +| `characteristic: gs access` | instruction, basic block, function | Access to memory via the `gs` segment. | +| `characteristic: cross section flow` | instruction, basic block, function | Function contains a call/jump to a different section. This is commonly seen in unpacking stubs. | +| `characteristic: indirect call` | instruction, basic block, function | Indirect call instruction; for example, `call edx` or `call qword ptr [rsp+78h]`. | +| `characteristic: call $+5` | instruction, basic block, function | Call just past the current instruction. | +| `characteristic: unmanaged call` | instruction, basic block, function | Function contains a call from managed code to unmanaged (native) code, often seen in .NET | ## instruction features @@ -433,8 +463,6 @@ Example: namespace: System.IO namespace: System.Net -TODO: add reference to rule with this feature, and for all other features. - ### class A named class used by the logic of the program. This must include the class's namespace if recoverable. @@ -445,6 +473,9 @@ Example: class: System.IO.File class: System.Net.WebResponse +Example rule: [create new application domain in .NET](../host-interaction/memory/create-new-application-domain-in-dotnet.yml) + + ### api A call to a named function, probably an import, though possibly a local function (like `malloc`) extracted via function signature matching like FLIRT. @@ -466,6 +497,8 @@ Example: api: System.Net.WebResponse::GetResponseStream api: System.Threading.Mutex::ctor # match creation System.Threading.Mutex object +Example rule: [switch active desktop](../host-interaction/gui/switch-active-desktop.yml) + ### property A member of a class or structure used by the logic of a program. This must include the member's class and namespace if recoverable. @@ -476,6 +509,8 @@ Example: property/read: System.Environment::OSVersion property/write: System.Net.WebRequest::Proxy +Example rule: [enumere GUI resources](../host-interaction/gui/enumerate-gui-resources.yml) + ### number A number used by the logic of the program. This should not be a stack or structure offset. @@ -504,6 +539,8 @@ If the number is only relevant on a particular architecture, don't hesitate to u - number: 4 = size of pointer ``` +Example rule: [get disk size](../host-interaction/hardware/storage/get-disk-size.yml) + ### string and substring A string referenced by the logic of the program. This is probably a pointer to an ASCII or Unicode string. @@ -548,6 +585,8 @@ Examples: Note that regex and substring matching is expensive (`O(features)` rather than `O(1)`) so they should be used sparingly. +Example rule: [identify ATM dispenser service provider](../targeting/automated-teller-machine/identify-atm-dispenser-service-provider.yml) + ### bytes A sequence of bytes referenced by the logic of the program. The provided sequence must match from the beginning of the referenced bytes and be no more than `0x100` bytes. @@ -571,6 +610,8 @@ Example rule elements: bytes: 01 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = CLSID_ShellLink bytes: EE 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = IID_IShellLink +Example rule: [hash data using Whirlpool](../nursery/hash-data-using-whirlpool.yml) + ### com COM features represent Component Object Model (COM) interfaces and classes used in the program's logic. They help identify interactions with COM objects, methods, properties, and interfaces. The parameter is the name of the COM class or interface. This feature allows you to list human-readable names instead of the byte representations found in the program. @@ -633,7 +674,9 @@ Examples: mnemonic: xor mnemonic: shl - + + +Example rule: [check for trap flag exception](../anti-analysis/anti-debugging/debugger-detection/check-for-trap-flag-exception.yml) ### operand @@ -645,6 +688,8 @@ Examples: operand[0].number: 0x10 operand[1].offset: 0x2C +Example rule: [encrypt data using XTEA](../data-manipulation/encryption/xtea/encrypt-data-using-xtea.yml) + ## basic block features Basic block features stem from combinations of features from the instruction scope that are found within the same basic block. @@ -707,6 +752,8 @@ To specify a [forwarded export](https://devblogs.microsoft.com/oldnewthing/20060 export: "c:/windows/system32/version.GetFileVersionInfoA" export: "vresion.GetFileVersionInfoA" +Example rule: [act as password filter DLL](../persistence/authentication-process/act-as-password-filter-dll.yml) + ### import The name of a routine imported from a shared library. These can include DLL names that are checked during matching. @@ -718,6 +765,8 @@ Examples: import: kernel32.#22 # by ordinal import: System.IO.File::Exists +Example rule: [load NCR ATM library](../targeting/automated-teller-machine/ncr/load-ncr-atm-library.yml) + ### function-name The name of a recognized statically-linked library, such as recovered via FLIRT, or a name extracted from information contained in the file, such as .NET metadata. @@ -728,6 +777,8 @@ Examples: function-name: "?FillEncTable@Base@Rijndael@CryptoPP@@KAXXZ" function-name: Malware.Backdoor::Beacon +Example rule: [execute via .NET startup hook](../runtime/dotnet/execute-via-dotnet-startup-hook.yml) + ### section The name of a section in a structured file. @@ -737,6 +788,8 @@ Examples: section: .rsrc +Example rule: [compiled with DMD](../compiler/d/compiled-with-dmd.yml) + ## global features Global features are extracted at all scopes. @@ -795,6 +848,8 @@ Valid OSes: Note: you can match any valid OS by not specifying an `os` feature or by using `any`, e.g. `- os: any`. +Example rule: [discover group policy via gpresult](../collection/group-policy/discover-group-policy-via-gpresult.yml) + ### arch The name of the CPU architecture on which the sample runs. @@ -837,6 +892,8 @@ However, this can be useful if you have groups of many architecture-specific off This can be easier to understand than using many `offset/x32` or `offset/x64` features. +Example rule: [get process heap flags](../host-interaction/process/get-process-heap-flags.yml) + ### format The name of the file format. @@ -846,6 +903,8 @@ Valid formats: - `elf` - `dotnet` +Example rule: [access .NET resource](../executable/resource/access-dotnet-resource.yml) + ## counting Many rules will inspect the feature set for a select combination of features; From 2b87bf02a91794986e7c73fdf6a22ea6544d8ff5 Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 12:58:11 +0000 Subject: [PATCH 03/15] format: reorganize features vs scopes --- doc/format.md | 210 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 126 insertions(+), 84 deletions(-) diff --git a/doc/format.md b/doc/format.md index a05ae728..9fcac59e 100644 --- a/doc/format.md +++ b/doc/format.md @@ -48,8 +48,19 @@ We'll start at the high level structure and then dig into the logic structures a - [analysis flavors](#analysis-flavors) - [features block](#features-block) - [extracted features](#extracted-features) - - [characteristic](#characteristic) - - [instruction features](#instruction-features) + - [static analysis scopes](#static-analysis-scopes) + - [instruction features](#instruction-features) + - [basic block features](#basic-block-features) + - [function features](#function-features) + - [dynamic analysis scopes](#dynamic-analysis-scopes) + - [call features](#call-features) + - [thread features](#thread-features) + - [process features](#process-features) + - [common scopes](#common-scopes) + - [file features](#file-features) + - [global features](#global-features) + - [complete feature listing](#complete-feature-listing) + - [characteristic](#characteristic) - [namespace](#namespace) - [class](#class) - [api](#api) @@ -60,9 +71,6 @@ We'll start at the high level structure and then dig into the logic structures a - [offset](#offset) - [mnemonic](#mnemonic) - [operand](#operand) - - [basic block features](#basic-block-features) - - [function features](#function-features) - - [file features](#file-features) - [string and substring](#file-string-and-substring) - [export](#export) - [import](#import) @@ -70,7 +78,6 @@ We'll start at the high level structure and then dig into the logic structures a - [function-name](#function-name) - [namespace](#namespace) - [class](#class) - - [global features](#global-features) - [os](#os) - [arch](#arch) - [format](#format) @@ -113,7 +120,7 @@ meta: name: packed with UPX namespace: anti-analysis/packer/upx authors: - - william.ballenthin@mandiant.com + - william.ballenthin@mandiant.com description: the sample appears to be packed with UPX scopes: static: file @@ -121,13 +128,12 @@ meta: att&ck: - Defense Evasion::Obfuscated Files or Information [T1027.002] mbc: - - Anti-Static Analysis::Software Packing + - Anti-Static Analysis::Software Packing examples: - CD2CBA9E6313E8DF2C1273593E649682 - Practical Malware Analysis Lab 01-02.exe_:0x0401000 ``` - Here are the common fields: - `name` is required. This string should uniquely identify the rule. More details below. @@ -140,17 +146,17 @@ Here are the common fields: - `scopes` indicates which feature set the rule applies to, when analyzing static or dynamic analysis artifacts. There are two required sub fields: `static` and `dynamic`. Here are the legal values: - `scopes.static`: - - **`file`**: matches features across the whole file. - - **`function`** (default): match features within each function. - - **`basic block`**: matches features within each basic block. - This is used to achieve locality in rules (for example for parameters of a function). - **`instruction`**: matches features found at a single instruction. This is great to identify structure access or comparisons against magic constants. + - **`basic block`**: matches features within each basic block. + This is used to achieve close locality in rules (for example for parameters of a function). + - **`function`**: match features within each function. + - **`file`**: matches features across the whole file. - `scopes.dynamic`: - - **`file`**: matches features across the whole file, including from the executable file features *and* across the entire runtime trace. - - **`process`**: match features within each process. - - **`thread`**: match features within each thread, such as sequence of API names. - **`call`**: match features at each traced API call site, such as API name and argument values. + - **`thread`**: match features within each thread, such as sequence of API names. + - **`process`**: match features within each process. + - **`file`**: matches features across the whole file, including from the executable file features *and* across the entire runtime trace. - `att&ck` is an optional list of [ATT&CK framework](https://attack.mitre.org/) techniques that the rule implies, like `Discovery::Query Registry [T1012]` or `Persistence::Create or Modify System Process::Windows Service [T1543.003]`. @@ -398,6 +404,110 @@ This way, you can use the match results against instructions ("the constant X is | [arch](#arch) | global | global | | [format](#format) | global | global | +## static analysis scopes + +### instruction features + +Instruction features stem from individual instructions, such as mnemonics, string references, or function calls. +The following features are relevant at this scope and above: + + - [namespace](#namespace) + - [class](#class) + - [api](#api) + - [property](#property) + - [number](#number) + - [string and substring](#string-and-substring) + - [bytes](#bytes) + - [com](#com) + - [offset](#offset) + - [mnemonic](#mnemonic) + - [operand](#operand) + +Also, the following [characteristics](#characteristic) are relevant at this scope and above: + - `nzxor` + - `peb access` + - `fs access` + - `gs access` + - `cross section flow` + - `indirect call` + - `call $+5` + - `unmanaged call` + +### basic block features + +Basic block features stem from combinations of features from the instruction scope that are found within the same basic block. + +Also, the following [characteristics](#characteristic) are relevant at this scope and above: + - `tight loop` + - `stack string` + +### function features + +Function features stem from combinations of features from the instruction and basic block scopes that are found within the same function. + +Also, the following [characteristics](#characteristic) are relevant at this scope and above: + - `loop` + - `recursive call` + - `calls from` + - `calls to` + +## dynamic analysis scopes + +### call features + +Call features are collected from individual sandbox trace events, such as API calls. +They're typically useful for matching against the API name and arguments (strings or integer constants). + +The following features are relevant at this scope and above: + + - [api](#api) + - [number](#number) + - [string and substring](#string-and-substring) + - [bytes](#bytes) + +### thread features + +Thread features stem from combinations of features from the call scopes that are found within the same thread. +This is useful for matching a sequence of API calls, such as `OpenFile`/`ReadFile`/`CloseFile`. + +There are no thread-specific features. + +### process features + +Process features are combinations of features from the thread scopes found within the same process. +This is useful for matching behaviors found across an entire program, even if its multi-threaded. + +There are no process-specific features. + +## common scopes + +### file features + +File features stem from the file structure, i.e. PE structure or the raw file data. + +Also, all features found in all functions (static) or all processes (dynamic) are collected into the file scope. + +The following features are supported at this scope: + + - [string and substring](#file-string-and-substring) + - [export](#export) + - [import](#import) + - [section](#section) + - [function-name](#function-name) + - [namespace](#namespace) + - [class](#class) + +### global features + +Global features are extracted at all scopes. +These are features that may be useful to both disassembly and file structure interpretation, such as the targeted OS or architecture. +The following features are supported at this scope: + + - [os](#os) + - [arch](#arch) + - [format](#format) + +## complete feature listing ### characteristic @@ -426,33 +536,6 @@ For example, the `characteristic: nzxor` feature describes non-zeroing XOR instr | `characteristic: call $+5` | instruction, basic block, function | Call just past the current instruction. | | `characteristic: unmanaged call` | instruction, basic block, function | Function contains a call from managed code to unmanaged (native) code, often seen in .NET | -## instruction features - -Instruction features stem from individual instructions, such as mnemonics, string references, or function calls. -The following features are relevant at this scope and above: - - - [namespace](#namespace) - - [class](#class) - - [api](#api) - - [property](#property) - - [number](#number) - - [string and substring](#string-and-substring) - - [bytes](#bytes) - - [com](#com) - - [offset](#offset) - - [mnemonic](#mnemonic) - - [operand](#operand) - -Also, the following [characteristics](#characteristic) are relevant at this scope and above: - - `nzxor` - - `peb access` - - `fs access` - - `gs access` - - `cross section flow` - - `indirect call` - - `call $+5` - - `unmanaged call` - ### namespace A named namespace used by the logic of the program. @@ -690,37 +773,6 @@ Examples: Example rule: [encrypt data using XTEA](../data-manipulation/encryption/xtea/encrypt-data-using-xtea.yml) -## basic block features -Basic block features stem from combinations of features from the instruction scope that are found within the same basic block. - -Also, the following [characteristics](#characteristic) are relevant at this scope and above: - - `tight loop` - - `stack string` - - -## function features -Function features stem from combinations of features from the instruction and basic block scopes that are found within the same function. - -Also, the following [characteristics](#characteristic) are relevant at this scope and above: - - `loop` - - `recursive call` - - `calls from` - - `calls to` - - -## file features - -File features stem from the file structure, i.e. PE structure or the raw file data. -The following features are supported at this scope: - - - [string and substring](#file-string-and-substring) - - [export](#export) - - [import](#import) - - [section](#section) - - [function-name](#function-name) - - [namespace](#namespace) - - [class](#class) - ### file string and substring An ASCII or UTF-16 LE string present in the file. @@ -790,16 +842,6 @@ Examples: Example rule: [compiled with DMD](../compiler/d/compiled-with-dmd.yml) -## global features - -Global features are extracted at all scopes. -These are features that may be useful to both disassembly and file structure interpretation, such as the targeted OS or architecture. -The following features are supported at this scope: - - - [os](#os) - - [arch](#arch) - - [format](#format) - ### os The name of the OS on which the sample runs. This is determined via heuristics applied to the file format (e.g. PE files are for Windows, header fields and notes sections in ELF files indicate Linux/*BSD/etc.). From 1f1407688aa4875f1b0f68c5a4c8a818ab3b9d51 Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:08:09 +0100 Subject: [PATCH 04/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index 9fcac59e..191210fa 100644 --- a/doc/format.md +++ b/doc/format.md @@ -294,7 +294,7 @@ rule: - api: System.Threading.Mutex::ctor ``` -See how "create mutex" can be reasoned about both by inspecting the disassembly features (static analysis) as well as the runtime API trace (dynamic analysis)? +See how `create mutex` can be reasoned about both by inspecting the disassembly features (static analysis) as well as the runtime API trace (dynamic analysis)? On the other hand, some behaviors are best described by rules that work in only one scope. (Remember, its paramount that rules be human-readable, so avoid complicating logic for the sake of merging rules.) From 22054879257324c5f09eae801646ab1bfe6f30de Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:08:16 +0100 Subject: [PATCH 05/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index 191210fa..cf658a71 100644 --- a/doc/format.md +++ b/doc/format.md @@ -297,7 +297,7 @@ rule: See how `create mutex` can be reasoned about both by inspecting the disassembly features (static analysis) as well as the runtime API trace (dynamic analysis)? On the other hand, some behaviors are best described by rules that work in only one scope. -(Remember, its paramount that rules be human-readable, so avoid complicating logic for the sake of merging rules.) +Remember, its paramount that rules be human-readable, so avoid complicating logic for the sake of merging rules. In this case, mark the excluded scope with `unsupported`, like in the following rule: ```yml From 5250f9cf5e356b8e5f0e4e21f20f27349d670477 Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:08:31 +0100 Subject: [PATCH 06/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index cf658a71..91db805a 100644 --- a/doc/format.md +++ b/doc/format.md @@ -319,7 +319,7 @@ rule: - match: contain loop ``` -"check for software breakpoints" works great during disassembly analysis, such as mnemonic and operand matching, but doesn't work in dynamic scopes because these features aren't available. So, we mark the rule `scopes.dynamic: unsupported` so the rule won't be considered when processing sandbox traces. +`check for software breakpoints` works great during disassembly analysis, where low-level instruction features can be matched, but doesn't work in dynamic scopes because these features aren't available. Hence, we mark the rule `scopes.dynamic: unsupported` so the rule won't be considered when processing sandbox traces. As you'll see in the [extracted features](#extracted-features) section, capa matches features at various scopes, starting small (e.g., instruction) and growing large (e.g., file). In static analysis, scopes grow from instruction, to basic block, function, and then file. In dynamic analysis, scopes from call, to thread, process, and then to file. From 8d2a7a4aecc06bac870f6eb82afb41f638aa3caa Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:08:38 +0100 Subject: [PATCH 07/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index 91db805a..bef2341d 100644 --- a/doc/format.md +++ b/doc/format.md @@ -321,7 +321,7 @@ rule: `check for software breakpoints` works great during disassembly analysis, where low-level instruction features can be matched, but doesn't work in dynamic scopes because these features aren't available. Hence, we mark the rule `scopes.dynamic: unsupported` so the rule won't be considered when processing sandbox traces. -As you'll see in the [extracted features](#extracted-features) section, capa matches features at various scopes, starting small (e.g., instruction) and growing large (e.g., file). In static analysis, scopes grow from instruction, to basic block, function, and then file. In dynamic analysis, scopes from call, to thread, process, and then to file. +As you'll see in the [extracted features](#extracted-features) section, capa matches features at various scopes, starting small (e.g., `instruction`) and growing large (e.g., `file`). In static analysis, scopes grow from `instruction`, to `basic block`, `function`, and then `file`. In dynamic analysis, scopes grow from `call`, to `thread`, `process`, and then to `file`. When matching a sequence of API calls, the static scope is often "function" and the dynamic scope is "thread". When matching a single API call with arguments, the static scope is usually "basic block" and the dynamic scope is "call". One day we hope to support "call" scope directly in the static analysis flavor. From 8385a68f6230445fe09e0287abda7013bf9653ac Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:08:49 +0100 Subject: [PATCH 08/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index bef2341d..eea1fb11 100644 --- a/doc/format.md +++ b/doc/format.md @@ -323,7 +323,7 @@ rule: As you'll see in the [extracted features](#extracted-features) section, capa matches features at various scopes, starting small (e.g., `instruction`) and growing large (e.g., `file`). In static analysis, scopes grow from `instruction`, to `basic block`, `function`, and then `file`. In dynamic analysis, scopes grow from `call`, to `thread`, `process`, and then to `file`. -When matching a sequence of API calls, the static scope is often "function" and the dynamic scope is "thread". When matching a single API call with arguments, the static scope is usually "basic block" and the dynamic scope is "call". One day we hope to support "call" scope directly in the static analysis flavor. +When matching a sequence of API calls, the static scope is often `function` and the dynamic scope is `thread`. When matching a single API call with arguments, the static scope is usually `basic block` and the dynamic scope is `call`. One day we hope to support `call` scope directly in the static analysis flavor. ## features block From 09f6327761b4320e16d15c069fbfefd91a41935c Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:09:01 +0100 Subject: [PATCH 09/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index eea1fb11..80e9bd36 100644 --- a/doc/format.md +++ b/doc/format.md @@ -758,7 +758,6 @@ Examples: mnemonic: xor mnemonic: shl - Example rule: [check for trap flag exception](../anti-analysis/anti-debugging/debugger-detection/check-for-trap-flag-exception.yml) ### operand From 11857d35db3df75f52814f466c0b7c550e2fbade Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:09:18 +0100 Subject: [PATCH 10/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index 80e9bd36..27131d65 100644 --- a/doc/format.md +++ b/doc/format.md @@ -376,7 +376,7 @@ capa matches features at multiple scopes, starting small (e.g., instruction) and | process | combinations of other capabilities found within a (potentially multi-threaded) program | | (common) | --- | | file | high level conclusions, like encryptor, backdoor, or statically linked with some library | -| global | the features available at every scope, like arch or OS | +| global | the features available at every scope, like architechture or OS | In general, capa collects and merges the features from lower scopes into higher scopes; for example, features extracted from individual instructions are merged into the function scope that contains the instructions. From f07b6d04d1738dcd56ec37f25f879b2432a5f046 Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:09:31 +0100 Subject: [PATCH 11/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index 27131d65..b6b57cec 100644 --- a/doc/format.md +++ b/doc/format.md @@ -362,7 +362,7 @@ If only one of these features is found in a function, the rule will not match. # extracted features -capa matches features at multiple scopes, starting small (e.g., instruction) and growing large (e.g., file). In static analysis, scopes grow from instruction, to basic block, function, and then file. In dynamic analysis, scopes from call, to thread, process, and then to file: +capa matches features at multiple scopes, starting small (e.g., `instruction`) and growing large (e.g., `file`). In static analysis, scopes grow from `instruction`, to `basic block`, `function`, and then `file`. In dynamic analysis, scopes grow from `call`, to `thread`, `process`, and then to `file`: | scope | best for... | |-------------|------------------------------------------------------------------------------------------| From fe77ae430dc841f03426b0ed6cc8452128c6cb4f Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 13:14:57 +0000 Subject: [PATCH 12/15] format: table formatting --- doc/format.md | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/doc/format.md b/doc/format.md index b6b57cec..137eddd9 100644 --- a/doc/format.md +++ b/doc/format.md @@ -364,19 +364,21 @@ If only one of these features is found in a function, the rule will not match. capa matches features at multiple scopes, starting small (e.g., `instruction`) and growing large (e.g., `file`). In static analysis, scopes grow from `instruction`, to `basic block`, `function`, and then `file`. In dynamic analysis, scopes grow from `call`, to `thread`, `process`, and then to `file`: -| scope | best for... | -|-------------|------------------------------------------------------------------------------------------| -| (static) | --- | -| instruction | specific combinations of mnemonics, operands, constants, etc. to find magic values | -| basic block | closely related instructions, such as structure access or function call arguments | -| function | collections of API calls, constants, etc. that suggest complete capabilities | -| (dynamic) | --- | -| call | single API call and its arguments | -| thread | sequence of related API calls | -| process | combinations of other capabilities found within a (potentially multi-threaded) program | -| (common) | --- | -| file | high level conclusions, like encryptor, backdoor, or statically linked with some library | -| global | the features available at every scope, like architechture or OS | +| static scope | best for... | +|--------------|------------------------------------------------------------------------------------------| +| instruction | specific combinations of mnemonics, operands, constants, etc. to find magic values | +| basic block | closely related instructions, such as structure access or function call arguments | +| function | collections of API calls, constants, etc. that suggest complete capabilities | +| file | high level conclusions, like encryptor, backdoor, or statically linked with some library | +| global | the features available at every scope, like architechture or OS | + +| dynamic scope | best for... | +|---------------|------------------------------------------------------------------------------------------| +| call | single API call and its arguments | +| thread | sequence of related API calls | +| process | combinations of other capabilities found within a (potentially multi-threaded) program | +| file | high level conclusions, like encryptor, backdoor, or statically linked with some library | +| global | the features available at every scope, like architechture or OS | In general, capa collects and merges the features from lower scopes into higher scopes; for example, features extracted from individual instructions are merged into the function scope that contains the instructions. From 735535bb5721785a649d6f4a5a6df0b3c86f8921 Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 13:19:20 +0000 Subject: [PATCH 13/15] format: try to express scoping for features --- doc/format.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/doc/format.md b/doc/format.md index 137eddd9..9aa48b82 100644 --- a/doc/format.md +++ b/doc/format.md @@ -384,27 +384,27 @@ In general, capa collects and merges the features from lower scopes into higher for example, features extracted from individual instructions are merged into the function scope that contains the instructions. This way, you can use the match results against instructions ("the constant X is for crypto algorithm Y") to recognize function-level capabilities ("crypto function Z"). -| feature | static scope | dynamic scope | -|-----------------------------------|--------------|---------------| -| [api](#api) | instruction | call | -| [string](#string-and-substring) | instruction | call | -| [bytes](#bytes) | instruction | call | -| [number](#number) | instruction | call | -| [characteristic](#characteristic) | instruction | - | -| [mnemonic](#mnemonic) | instruction | - | -| [operand](#operand) | instruction | - | -| [offset](#offset) | instruction | - | -| [com](#com) | instruction | - | -| [namespace](#namespace) | instruction | - | -| [class](#class) | instruction | - | -| [property](#property) | instruction | - | -| [export](#export) | file | file | -| [import](#import) | file | file | -| [section](#section) | file | file | -| [function-name](#function-name) | file | - | -| [os](#os) | global | global | -| [arch](#arch) | global | global | -| [format](#format) | global | global | +| feature | static scope | dynamic scope | +|-----------------------------------|---------------------------------------------|--------------------------------| +| [api](#api) | instruction ↦ basic block ↦ function ↦ file | call ↦ thread ↦ process ↦ file | +| [string](#string-and-substring) | instruction ↦ ... | call ↦ ... | +| [bytes](#bytes) | instruction ↦ ... | call ↦ ... | +| [number](#number) | instruction ↦ ... | call ↦ ... | +| [characteristic](#characteristic) | instruction ↦ ... | - | +| [mnemonic](#mnemonic) | instruction ↦ ... | - | +| [operand](#operand) | instruction ↦ ... | - | +| [offset](#offset) | instruction ↦ ... | - | +| [com](#com) | instruction ↦ ... | - | +| [namespace](#namespace) | instruction ↦ ... | - | +| [class](#class) | instruction ↦ ... | - | +| [property](#property) | instruction ↦ ... | - | +| [export](#export) | file | file | +| [import](#import) | file | file | +| [section](#section) | file | file | +| [function-name](#function-name) | file | - | +| [os](#os) | global | global | +| [arch](#arch) | global | global | +| [format](#format) | global | global | ## static analysis scopes From aa731af73c62b706ae80536b57ada6e39be53934 Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:20:07 +0100 Subject: [PATCH 14/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index 9aa48b82..2aa194ad 100644 --- a/doc/format.md +++ b/doc/format.md @@ -378,7 +378,7 @@ capa matches features at multiple scopes, starting small (e.g., `instruction`) a | thread | sequence of related API calls | | process | combinations of other capabilities found within a (potentially multi-threaded) program | | file | high level conclusions, like encryptor, backdoor, or statically linked with some library | -| global | the features available at every scope, like architechture or OS | +| global | the features available at every scope, like architecture or OS | In general, capa collects and merges the features from lower scopes into higher scopes; for example, features extracted from individual instructions are merged into the function scope that contains the instructions. From 8c565178c2fc197adbf225164aa591947a4cf926 Mon Sep 17 00:00:00 2001 From: Willi Ballenthin Date: Wed, 29 Nov 2023 14:21:26 +0100 Subject: [PATCH 15/15] Update doc/format.md Co-authored-by: Moritz --- doc/format.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/format.md b/doc/format.md index 2aa194ad..8d1256a6 100644 --- a/doc/format.md +++ b/doc/format.md @@ -370,7 +370,7 @@ capa matches features at multiple scopes, starting small (e.g., `instruction`) a | basic block | closely related instructions, such as structure access or function call arguments | | function | collections of API calls, constants, etc. that suggest complete capabilities | | file | high level conclusions, like encryptor, backdoor, or statically linked with some library | -| global | the features available at every scope, like architechture or OS | +| global | the features available at every scope, like architecture or OS | | dynamic scope | best for... | |---------------|------------------------------------------------------------------------------------------|