Skip to content

Commit

Permalink
Merge pull request #1697 from mandiant/dynamic-feature-extraction
Browse files Browse the repository at this point in the history
add dynamic analysis
  • Loading branch information
mr-tz authored Nov 29, 2023
2 parents a236a95 + 47019e4 commit 4c3586b
Show file tree
Hide file tree
Showing 86 changed files with 6,623 additions and 1,284 deletions.
18 changes: 11 additions & 7 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ jobs:
- name: Lint with ruff
run: pre-commit run ruff
- name: Lint with isort
run: pre-commit run isort
run: pre-commit run isort --show-diff-on-failure
- name: Lint with black
run: pre-commit run black
run: pre-commit run black --show-diff-on-failure
- name: Lint with flake8
run: pre-commit run flake8
run: pre-commit run flake8 --hook-stage manual
- name: Check types with mypy
run: pre-commit run mypy
run: pre-commit run mypy --hook-stage manual

rule_linter:
runs-on: ubuntu-20.04
Expand Down Expand Up @@ -95,6 +95,10 @@ jobs:
run: sudo apt-get install -y libyaml-dev
- name: Install capa
run: pip install -e .[dev]
- name: Run tests (fast)
# this set of tests runs about 80% of the cases in 20% of the time,
# and should catch most errors quickly.
run: pre-commit run pytest-fast --all-files --hook-stage manual
- name: Run tests
run: pytest -v tests/

Expand All @@ -103,7 +107,7 @@ jobs:
env:
BN_SERIAL: ${{ secrets.BN_SERIAL }}
runs-on: ubuntu-20.04
needs: [code_style, rule_linter]
needs: [tests]
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -143,7 +147,7 @@ jobs:
ghidra-tests:
name: Ghidra tests for ${{ matrix.python-version }}
runs-on: ubuntu-20.04
needs: [code_style, rule_linter]
needs: [tests]
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -197,4 +201,4 @@ jobs:
cat ../output.log
exit_code=$(cat ../output.log | grep exit | awk '{print $NF}')
exit $exit_code
2 changes: 2 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
[submodule "rules"]
path = rules
url = ../capa-rules.git
branch = dynamic-syntax
[submodule "tests/data"]
path = tests/data
url = ../capa-testfiles.git
branch = dynamic-feature-extractor
28 changes: 23 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ repos:
hooks:
- id: isort
name: isort
stages: [commit, push]
stages: [commit, push, manual]
language: system
entry: isort
args:
Expand All @@ -45,7 +45,7 @@ repos:
hooks:
- id: black
name: black
stages: [commit, push]
stages: [commit, push, manual]
language: system
entry: black
args:
Expand All @@ -62,7 +62,7 @@ repos:
hooks:
- id: ruff
name: ruff
stages: [commit, push]
stages: [commit, push, manual]
language: system
entry: ruff
args:
Expand All @@ -79,7 +79,7 @@ repos:
hooks:
- id: flake8
name: flake8
stages: [commit, push]
stages: [push, manual]
language: system
entry: flake8
args:
Expand All @@ -97,7 +97,7 @@ repos:
hooks:
- id: mypy
name: mypy
stages: [commit, push]
stages: [push, manual]
language: system
entry: mypy
args:
Expand All @@ -109,3 +109,21 @@ repos:
- "tests/"
always_run: true
pass_filenames: false

- repo: local
hooks:
- id: pytest-fast
name: pytest (fast)
stages: [manual]
language: system
entry: pytest
args:
- "tests/"
- "--ignore=tests/test_binja_features.py"
- "--ignore=tests/test_ghidra_features.py"
- "--ignore=tests/test_ida_features.py"
- "--ignore=tests/test_viv_features.py"
- "--ignore=tests/test_main.py"
- "--ignore=tests/test_scripts.py"
always_run: true
pass_filenames: false
19 changes: 15 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,25 @@
## master (unreleased)

### New Features
- ghidra: add Ghidra feature extractor and supporting code #1770 @colton-gabertan
- ghidra: add entry script helping users run capa against a loaded Ghidra database #1767 @mike-hunhoff
- add Ghidra backend #1770 #1767 @colton-gabertan @mike-hunhoff
- add dynamic analysis via CAPE sandbox reports #48 #1535 @yelhamer
- add call scope #771 @yelhamer
- add thread scope #1517 @yelhamer
- add process scope #1517 @yelhamer
- rules: change `meta.scope` to `meta.scopes` @yelhamer
- protobuf: add `Metadata.flavor` @williballenthin
- binja: add support for forwarded exports #1646 @xusheng6
- binja: add support for symtab names #1504 @xusheng6
- add com class/interface features #322 @Aayush-goel-04

### Breaking Changes

- remove the `SCOPE_*` constants in favor of the `Scope` enum #1764 @williballenthin
- protobuf: deprecate `RuleMetadata.scope` in favor of `RuleMetadata.scopes` @williballenthin
- protobuf: deprecate `Metadata.analysis` in favor of `Metadata.analysis2` that is dynamic analysis aware @williballenthin
- update freeze format to v3, adding support for dynamic analysis @williballenthin
- extractor: ignore DLL name for api features #1815 @mr-tz

### New Rules (34)

- nursery/get-ntoskrnl-base-address @mr-tz
Expand Down Expand Up @@ -49,9 +60,9 @@
-

### Bug Fixes
- ghidra: fix ints_to_bytes performance #1761 @mike-hunhoff
- ghidra: fix `ints_to_bytes` performance #1761 @mike-hunhoff
- binja: improve function call site detection @xusheng6
- binja: use binaryninja.load to open files @xusheng6
- binja: use `binaryninja.load` to open files @xusheng6
- binja: bump binja version to 3.5 #1789 @xusheng6

### capa explorer IDA Pro plugin
Expand Down
127 changes: 108 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE.txt)

capa detects capabilities in executable files.
You run it against a PE, ELF, .NET module, or shellcode file and it tells you what it thinks the program can do.
You run it against a PE, ELF, .NET module, shellcode file, or a sandbox report and it tells you what it thinks the program can do.
For example, it might suggest that the file is a backdoor, is capable of installing services, or relies on HTTP to communicate.

Check out:
Expand Down Expand Up @@ -125,6 +125,96 @@ function @ 0x4011C0
...
```

Additionally, capa also supports analyzing [CAPE](https://github.com/kevoreilly/CAPEv2) sandbox reports for dynamic capabilty extraction.
In order to use this, you first submit your sample to CAPE for analysis, and then run capa against the generated report (JSON).

Here's an example of running capa against a packed binary, and then running capa against the CAPE report of that binary:

```yaml
$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.exe
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------
WARNING:capa.capabilities.common: This sample appears to be packed.
WARNING:capa.capabilities.common:
WARNING:capa.capabilities.common: Packed samples have often been obfuscated to hide their logic.
WARNING:capa.capabilities.common: capa cannot handle obfuscation well using static analysis. This means the results may be misleading or incomplete.
WARNING:capa.capabilities.common: If possible, you should try to unpack this input file before analyzing it with capa.
WARNING:capa.capabilities.common: Alternatively, run the sample in a supported sandbox and invoke capa against the report to obtain dynamic analysis results.
WARNING:capa.capabilities.common:
WARNING:capa.capabilities.common: Identified via rule: (internal) packer file limitation
WARNING:capa.capabilities.common:
WARNING:capa.capabilities.common: Use -v or -vv if you really want to see the capabilities identified by capa.
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------

$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.json

┍━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ ATT&CK Tactic │ ATT&CK Technique │
┝━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ CREDENTIAL ACCESS │ Credentials from Password Stores T1555 │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DEFENSE EVASION │ File and Directory Permissions Modification T1222 │
│ │ Modify Registry T1112 │
│ │ Obfuscated Files or Information T1027 │
│ │ Virtualization/Sandbox Evasion::User Activity Based Checks T1497.002 │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DISCOVERY │ Account Discovery T1087 │
│ │ Application Window Discovery T1010 │
│ │ File and Directory Discovery T1083 │
│ │ Query Registry T1012 │
│ │ System Information Discovery T1082 │
│ │ System Location Discovery::System Language Discovery T1614.001 │
│ │ System Owner/User Discovery T1033 │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ EXECUTION │ System Services::Service Execution T1569.002 │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ PERSISTENCE │ Boot or Logon Autostart Execution::Registry Run Keys / Startup Folder T1547.001 │
│ │ Boot or Logon Autostart Execution::Winlogon Helper DLL T1547.004 │
│ │ Create or Modify System Process::Windows Service T1543.003 │
┕━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

┍━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Capability │ Namespace │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ check for unmoving mouse cursor (3 matches) │ anti-analysis/anti-vm/vm-detection │
│ gather bitkinex information │ collection/file-managers │
│ gather classicftp information │ collection/file-managers │
│ gather filezilla information │ collection/file-managers │
│ gather total-commander information │ collection/file-managers │
│ gather ultrafxp information │ collection/file-managers │
│ resolve DNS (23 matches) │ communication/dns │
│ initialize Winsock library (7 matches) │ communication/socket │
│ act as TCP client (3 matches) │ communication/tcp/client │
│ create new key via CryptAcquireContext │ data-manipulation/encryption │
│ encrypt or decrypt via WinCrypt │ data-manipulation/encryption │
│ hash data via WinCrypt │ data-manipulation/hashing │
│ initialize hashing via WinCrypt │ data-manipulation/hashing │
│ hash data with MD5 │ data-manipulation/hashing/md5 │
│ generate random numbers via WinAPI │ data-manipulation/prng │
│ extract resource via kernel32 functions (2 matches) │ executable/resource │
│ interact with driver via control codes (2 matches) │ host-interaction/driver │
│ get Program Files directory (18 matches) │ host-interaction/file-system │
│ get common file path (575 matches) │ host-interaction/file-system │
│ create directory (2 matches) │ host-interaction/file-system/create │
│ delete file │ host-interaction/file-system/delete │
│ get file attributes (122 matches) │ host-interaction/file-system/meta │
│ set file attributes (8 matches) │ host-interaction/file-system/meta │
│ move file │ host-interaction/file-system/move │
│ find taskbar (3 matches) │ host-interaction/gui/taskbar/find │
│ get keyboard layout (12 matches) │ host-interaction/hardware/keyboard │
│ get disk size │ host-interaction/hardware/storage │
│ get hostname (4 matches) │ host-interaction/os/hostname │
│ allocate or change RWX memory (3 matches) │ host-interaction/process/inject │
│ query or enumerate registry key (3 matches) │ host-interaction/registry │
│ query or enumerate registry value (8 matches) │ host-interaction/registry │
│ delete registry key │ host-interaction/registry/delete │
│ start service │ host-interaction/service/start │
│ get session user name │ host-interaction/session │
│ persist via Run registry key │ persistence/registry/run │
│ persist via Winlogon Helper DLL registry key │ persistence/registry/winlogon-helper │
│ persist via Windows service (2 matches) │ persistence/service │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙
```

capa uses a collection of rules to identify capabilities within a program.
These rules are easy to write, even for those new to reverse engineering.
By authoring rules, you can extend the capabilities that capa recognizes.
Expand All @@ -135,31 +225,30 @@ Here's an example rule used by capa:
```yaml
rule:
meta:
name: hash data with CRC32
namespace: data-manipulation/checksum/crc32
name: create TCP socket
namespace: communication/socket/tcp
authors:
- [email protected]
scope: function
- [email protected]
- [email protected]
- [email protected]
scopes:
static: basic block
dynamic: call
mbc:
- Data::Checksum::CRC32 [C0032.001]
- Communication::Socket Communication::Create TCP Socket [C0001.011]
examples:
- 2D3EDC218A90F03089CC01715A9F047F:0x403CBD
- 7D28CB106CB54876B2A5C111724A07CD:0x402350 # RtlComputeCrc32
- 7EFF498DE13CC734262F87E6B3EF38AB:0x100084A6
- Practical Malware Analysis Lab 01-01.dll_:0x10001010
features:
- or:
- and:
- mnemonic: shr
- number: 6 = IPPROTO_TCP
- number: 1 = SOCK_STREAM
- number: 2 = AF_INET
- or:
- number: 0xEDB88320
- bytes: 00 00 00 00 96 30 07 77 2C 61 0E EE BA 51 09 99 19 C4 6D 07 8F F4 6A 70 35 A5 63 E9 A3 95 64 9E = crc32_tab
- number: 8
- characteristic: nzxor
- and:
- number: 0x8320
- number: 0xEDB8
- characteristic: nzxor
- api: RtlComputeCrc32
- api: ws2_32.socket
- api: ws2_32.WSASocket
- api: socket
- property/read: System.Net.Sockets.TcpClient::Client
```
The [github.com/mandiant/capa-rules](https://github.com/mandiant/capa-rules) repository contains hundreds of standard library rules that are distributed with capa.
Expand Down
Empty file added capa/capabilities/__init__.py
Empty file.
Loading

0 comments on commit 4c3586b

Please sign in to comment.